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PREFACE 

Project  MICHIGAN  Is  a  continuing,  long-range  research  and  development  pro¬ 
gram  for  advancing  the  Army's  combat-surveillance  and  target -acquisition  capabil¬ 
ities,  The  program  is  carried  out  by  a  full-time  Institute  of  Science  and  Technology 
staff  of  specialists  in  the  fields  of  physics,  engineering,  mathematics,  and  psychology, 
by  members  of  the  teaching  faculty,  by  graduate  students,  and  by  other  research 
groups  and  laboratories  of  The  University  of  Michigan. 

The  emphasis  of  the  Project  is  upon  research  in  imaging  radar,  MTI  radar,  in¬ 
frared,  radio  location,  image  processing,  and  special  investigations.  Particular  at¬ 
tention  is  given  to  all-weather,  long-range,  high- resolution  sensory  and  location 
techniques. 

Project  MICHIGAN  was  established  by  the  U.  S.  Army  Signal  Corps  at  The  Uni¬ 
versity  of  Michigan  in  1953  and  has  received  continuing  support  from  the  U.  S.  Army. 
The  Project  constitutes  a  major  portion  of  the  diversified  program  of  research  con¬ 
ducted  by  the  Institute  of  Science  and  Technology  in  order  to  make  available  to  gov¬ 
ernment  and  industry  the  resources  of  The  University  of  Michigan  and  to  broaden 
the  educational  opportunities  for  students  in  the  scientific  and  engineering  disciplines. 

Progress  and  results  described  in  reports  are  continually  reassessed  by  Proj¬ 
ect  MICHIGAN.  Comments  and  suggestions  from  readers  are  invited. 

Robert  L.  Hess 

Director 
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a  a  constant 

C  convergence,  a  measure  of  the  subject's  response  (see  page  11) 

C  no  convergence,  a  measure  of  the  subject's  response  (see  page  11) 

D  detection,  a  measure  of  the  subject's  response  (see  page  11) 

D  no  detection,  a  measure  of  the  subject's  response  (see  page  11) 

e(n)  a  model’s  error  at  n,  e(n)  =  r(n)  -  P g 

2  2 
e  (n)  the  expected  value  of  e  (n),  an  ensemble  average 

2  g 

<e  >A  the  average  value  of  e  (n)  over  some  set  of  samples  A 
e»,(n)  the  descriptive  model’s  error  at  n,  ew(n)  =  r(n)  -  P„ 

M  M  it 

eMS^  the  error  (t.6.,  disagreement)  between  the  subject  and  the  descriptive  model  at  n, 
eMS(n)  *  Rn  •  r(n) 

e_(n)  the  subject  error  at  n,  e„(n)  =  R  -  P 
b  bn 

2  2  2  2 
<eg  >p,  <eM  >p,  <eMeMS>p  the  average  values  of  eg  (n),  eM  (n)  and  eM(n)eMg(n)  over  a 

problem 

E(x)  the  expected  value  of  x 

FAR  false  alarm  rate,  a  measure  of  the  subject's  response  (see  page  12) 
fps  flashes  per  second,  the  flash  rate 

IC  initial  convergence,  a  measure  of  the  subject’s  response  (see  page  11) 

k  fl^i-rV2 

kj  the  decision  criterion  level  in  the  descriptive  model 

kg  the  fractional  response  adjustment  in  the  descriptive  model 

kg  the  number  of  flashes  In  u(n)  in  the  descriptive  model 

k^  the  flash  shift  between  the  subject's  and  the  descriptive  model's  responses 

M  the  last  flash  In  a  subproblem 

Mj  the  length  of  subproblem  1 

MEC  mean  error  after  convergence,  a  measure  made  on  the  subject's  response  (see  page  12) 
n  a  flash  index,  n  =  1  is  the  first  flash  in  a  subproblem 

N  the  total  number  of  flashes  in  the  model's  summation 

P  the  probability  of  a  1  in  a  0,1  binary  series 

Pj  the  probability  of  a  1  in  the  binary  series  i 

Pr(x)  the  probability  of  x 

r  the  geometric  ratio  in  the  geometric  model 
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CONTINUOUS  HUMAN  ESTIMATION  OF  A  TIME -VARYING, 
SEQUENTIALLY  DISPLAYED  PROBABILITY 


ABSTRACT 

This  experiment  examines  the  human  ability  to  give  a  direct  magnitude  estimate 
of  a  time-varying  probability.  The  subject  positioned  a  "tracking"  lever  at  his  esti¬ 
mate  of  the  current  mean  of  a  sequentially  displayed  binary  distribution.  The  distri¬ 
bution  samples  were  presented  at  a  fixed  rate  by  two  flashing  lights.  The  distribution 
mean  changed  in  step  Increments  of  varying  size  and  spacing.  The  experimental 
variables  Included  flash  rate  and  a  constraint  on  the  randomness  of  the  flash  series. 

Detailed  measures  were  made  of  both  the  transient  and  static  responses  to  each 
step  change.  The  transient  response  was  more  rapid  and  consistent  than  had  been 
anticipated  and  occurred  with  step  changes  as  small  as  0.12.  The  average  static  re¬ 
sponse  showed  no  systematic  bias  as  a  function  of  probability  and  had  an  RMS  error 
approximately  equal  to  that  of  a  17-sample  average. 

Two  simple  mathematical  models  are  derived  to  provide  quantitative  compari¬ 
sons  with  the  subjects'  data.  A  descriptive  model  is  also  derived  which  satisfies 
some  basic  properties  of  the  task  behavior.  The  parameters  for  this  model  are  se¬ 
lected  for  two  specific  experimental  situations. 


1 

INTRODUCTION 

Human  decision  tasks  can  be  described  as  static  or  dynamic.  In  a  dynamic  decision  task, 
some  of  the  relevant  stimuli  vary  as  a  function  of  time,  or  of  past  decisions,  or  of  both.  The 
decision  maker  must  keep  track  of  these  changes  in  order  to  perform  satisfactorily. 

This  experiment  examines  the  human  ability  to  follow  or  estimate  a  time- varying  probabil¬ 
ity,  which  could  be  an  Important  input  to  a  dynamic  decision  task.  The  experiment  attempts  to 
Isolate  the  estimation  of  the  probability  from  the  use  of  the  estimate  In  making  decisions.  The 
task  selected  was  the  estimation  of  the  mean  of  a  binary  (Bernoulli)  distribution.  Samples  (0  or 
1)  from  the  distribution  were  displayed  sequentially,  and  the  subject  continuously  estimated  the 
distribution  mean,  which  varied  with  time.  The  experiment  Is  described  In  detail  in  Section  2. 

The  study  of  probability  estimation  Isolated  from  decision  making  is  important  for  two 
reasons.  First,  In  decision  making  under  uncertainty,  the  estimation  of  probabilities  is  always 
at  least  an  implicit  part  of  the  task.  A  decision  maker's  ability  to  produce  decisions  which 
maximize  expected  value  will  depend  directly  on  his  ability  to  estimate  the  probabilities  of  the 
various  alternative  courses  of  action. 
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Second,  there  te  an  applied  Interest  in  the  human  ability  to  turn  uncertainty  Into  probability. 
In  many  systems  involving  stochastic  Inputs,  It  Is  relatively  easy  to  automate  the  application  ol 
decision  rules.  It  is  far  harder,  however,  to  find  automatic  means  for  supplying  the  probabil¬ 
ities  and  payoffs  necessary  for  the  application  of  the  rules.  Probability  estimation,  then,  is  a 
candidate  for  inclusion  as  a  human  task  In  semiautomatic  Information  processing  and  decision¬ 
making  systems  In  which  the  subsequent  choice  of  a  course  of  action  Is  performed  automatically. 

RESEARCH  ON  ESTIMATION  AND  PREDICTION 

Human  binary-choice  behavior  has  been  studied  extensively.  This  experiment  complements 
previous  studies  by  Isolating  the  estimation  function  and  by  using  changing  probabilities. 

Most  studies  of  human  binary  choice  do  not  Include  estimation  as  an  explicit  part  of  the 
subjects'  task.  These  studies  usually  generate  prediction  data,  which  are  then  averaged  over 
blocks  of  trials  (decisions,  choices)  to  produce  prediction  frequencies.  A  prediction  frequency 
of  0.07  on  trials  121  through  ISO  would  Indicate  that  the  predictions  during  these  30  trials  were 
distributed  about  two  to  one  between  the  two  choices.  The  subject  may  or  may  not  be  told  the 
correct  choice  after  each  trial.  The  correct  choices  are  drawn  In  some  manner  from  a  sta¬ 
tionary  binary  distribution. 

Examples  of  these  experiments,  often  called  probability  learning  experiments,  are  reported 
by  Grant,  1953  [1];  Hake  and  Hyman,  1993  [2];  Hake,  1954  [3];  Estes,  1957  [4];  and  Nelmark  and 
Shuford,  1959  [5].  Most  of  these  studies  report  prediction  frequencies  asymptotically  approach¬ 
ing  the  frequency  of  correct  choice  or  the  generating  probability.  This  phenomenon  has  been 
named  "probability  matching."  This  behavior  is  not  optimum.  The  optimum  strategy,  under 
Instructions  to  maximize  correct  choices,  is  to  predict  consistently  the  more  probable  event. 
This  event  can  be  Inferred  from  the  relative  frequency  of  previous  events. 

Behavior  significantly  different  from  matching  has  been  reported  by  Gardner,  1959  [3],  and 
Edwards,  1991  [7].  The  number  of  trials  may  have  been  Insufficient  in  some  of  the  experiments 
In  which  matching  was  found.  An  unpublished  experiment  by  Tannenbaum  and  Edwards  at  The 
University  of  Michigan  indicates  that  the  amount  of  reward  for  a  correct  choice  Interacts  with 
the  prediction  frequency.  Some  subjects  used  near-optimum  strategies. 

A  few  studies  have  looked  at  the  estimation  ability  of  the  binary  decision  maker.  Grant  [1] 
reports  an  experiment  by  Hornseth  in  which  the  subject  was  asked  to  guess,  at  the  end  of  150 
choice  trials,  which  event  had  been  the  more  frequent.  The  prediction  frequencies  for  the  last 
30-trlal  block  were  cloee  to  the  matching  level.  The  data  on  guessing  the  overall  frequency 
were  plotted  as  the  percentage  of  correct  guesses.  These  data  showed  that  the  percentage  of 
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correct  guesses  at  a  particular  event  frequency  was  higher  than  the  event  frequency  (i.e.,  an 
event  frequency  of  0.70  would  be  guessed  to  be  the  more  frequent  over  70  percent  of  the  time). 

Grant  concludes  from  this  experiment  and  presumably  from  other  probability  learning  ex¬ 
periments  that  the  processes  of  estimation  and  prediction  are  distinct,  and  that  prediction  is  the 
more  accurate.  The  processes  may  indeed  by  distinct,  but  the  accuracy  of  estimation  was  not 
measured  by  Hornseth's  experiment. 

Hake  [3]  surveys  a  major  portion  of  the  probability  learning  literature  (including  experi¬ 
ments  by  Estes,  Grant,  Hornseth,  Hake,  and  Hyman),  and  concludes  that  estimation  is  not  ac¬ 
curate  enough  to  be  the  basis  for  binary  predictions. 

Subjects  in  these  experiments  should  have  based  their  choices  on  estimates  of  the  event 
frequencies  or  generating  probabilities.  To  conclude  that  the  non-optimum  performance  was 
an  Indicator  of  inaccurate  probability  estimates  is  unjustified,  however. 

Neimark  and  Shuford  [5]  Included  estimation  as  an  explicit  part  of  the  task  in  a  probability 
learning  experiment.  Besides  making  a  choice  at  each  trial,  some  subjects  were  required  to 
estimate  the  proportions  of  the  past  events.  The  event  frequency  was  0.67.  These  subjects 
gave  unbiased  estimates  and  had  prediction  frequencies  significantly  higher  than  the  matching 
level,  whereas  subjects  who  only  predicted  produced  frequencies  at  the  matching  level.  These 
results  suggest  that  explicit  estimation  Improved  prediction. 

Erllck  [8]  looked  at  estimation  without  a  decision  task.  He  presented  100  binary  events  at 
a  rate  of  five  per  second  and  asked  for  an  estimate  of  the  more  frequent  event  and  for  an  actual 
estimate  of  the  event  frequency  on  a  continuous  scale.  Four  event  frequencies  were  used:  0.50- 
0.50;  0.48-0.52  ;  0.45-0.55;  and  0.43-0.57.  The  data  indicated  that  the  more  frequent  event  was 
selected  correctly  75%  of  the  time  when  the  event-frequency  difference  was  approximately  0.08 
(0.46-0.54).  For  0.50  and  0.52,  the  median  estimate  of  the  frequency  was  within  0.01)  for  0.55 
and  0.57  the  median  estimate  was  about  0.02  high. 

All  of  the  experiments  reviewed  above  used  stationary  processes  to  generate  the  binary 
events.  A  few  experiments  have  used  a  dynamic  generating  process,  but  since  prediction  was 
the  required  task  in  all  of  these,  they  give  only  indirect  evidence  on  estimation. 

Grant  [1]  reports  an  experiment  in  which  the  generating  probability  changed  periodically 
as  a  square  wave.  The  probability  values  always  differed  by  0,50  with  higher  values:  1.00,  0.90, 
0.80,  and  0.70.  The  period  was  40  events,  and  two  and  one  half  cycles  were  presented.  A  pre¬ 
diction  frequency  was  calculated  by  averaging  over  five  trials  and  about  40  subjects.  This  pre¬ 
diction  frequency  followed  the  cyclic  change  only  when  the  higher  probability  was  1.00  or  0.90, 
and  reached  0.95  in  20  trials  at  1.00,  and  0.70  in  20  trials  at  0.90.  Apparently  no  systematic 
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performance  changes  occurred  during  the  two  and  one  half  cycles.  The  subjects  were  evidently 
not  instructed  about  the  nonstatlonarlty  of  the  generating  process.  Such  instructions  would  prob¬ 
ably  have  had  an  appreciable  effect. 

Goodnow  and  Pettigrew  [9]  performed  a  binary  prediction  experiment  in  which  a  change 
from  0.50-0. 50  to  0-1.00  occurred.  They  found  that  the  response  to  such  a  change  was  more 
rapid  when  the  subjects  had  initially  experienced  a  0-1.00  series  prior  to  the  0.50-0.50  series. 
Again,  however,  no  specific  Instructions  were  given  concerning  the  nonstatlonarlty  of  the  gen¬ 
erating  process. 

In  both  Grant's  and  Goodnow's  experiments  there  is  evidence  that  a  change  in  the  generating 
probability  of  0.50  will  produce  an  appropriate  change  in  the  prediction  frequency  if  the  change 
is  to  an  extreme  probability.  These  extreme  probabilities  (1.00  and  0.90)  evidently  represent 
changes  which  are  obvious  even  when  the  instructions  Induce  no  expectation  of  change. 

Flood  [10]  discusses  the  strategy  of  a  subject  who  may  not  be  convinced  that  the  probabil¬ 
ities  are  stationary.  No  particular  results  were  obtained  in  an  experiment  designed  to  Induce 
certainty  versus  uncertainty  in  the  statlonarity  of  a  stationary  generating  process. 

The  human  ability  to  estimate  directly  the  magnitude  of  a  stationary  binary  probability  is 
uncertain.  Most  experimenters  have  postulated  estimation  only  as  an  intervening  variable  be¬ 
tween  the  display  and  a  decision  task.  Decision  behavior  was  Improved  in  one  experiment  by 
including  explicit  estimation  in  the  task.  Two  questions  seem  appropriate:  what  role  does  esti¬ 
mation  play  in  a  decision  task,  and  how  well  can  this  estimation  be  performed?  The  experiment 
reported  here  sheds  light  on  the  second  question  as  well  as  providing  a  fairly  comprehensive 
look  at  the  continuous  estimation  of  dynamic  probabilities. 

2 

THC  IXPMMUNT 

2.1.  THE  TASK 

The  task  studied  in  this  experiment  was  to  estimate  the  mean  of  a  binary  distribution  as 
samples  (individual  drawings)  from  that  distribution  were  sequentially  displayed.  This  task 
was  selected  for  two  reasons.  First,  it  is  completely  described  by  one  parameter,  its  mean. 

It  is  thus  easily  understood  by  people  unfamiliar  with  the  mathematical  aspects  of  probability. 
Second,  it  can  be  readily  related  to  the  literature  on  binary  decision  and  estimation,  discussed 
in  Section  1. 

The  display  and  response  mechanisms,  shown  in  Figure  1,  were  designed  for  convenient 
and  effective  control  and  interpretation.  As  the  subject  sat  at  the  apparatus,  samples  from  a 
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FIGURE  1.  TRACKING  CONSOLE,  (a)  Sketch,  (b)  Schematic.  The  flash  had  a  duration  of  approximately 
0,020  seoonds.  The  Intensity  was  adjusted  to  provide  a  clear  Indicator  without  glare.  The  room  illumina¬ 
tion  was  low. 
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binary  distribution  were  presented  to  him  at  a  fixed  rate  by  two  flashing  lights.  He  reported 
his  estimate  of  the  mean  of  the  distribution  by  moving  the  tracking  lever  so  that  the  illuminated 
dial  beneath  the  lights  would  Indicate  it.  Thus  the  apparatus  was  a  continuous-response  mechan¬ 
ism — appropriate  for  the  estimation  of  a  continuously  varying  stimulus. 

The  lever  was  free  to  move  between  stops  at  0  and  100  on  the  scale.  The  smallest  scale 
division  was  2.  The  variable  error  of  the  pointer  was  about  one  half  of  the  least  division,  cor¬ 
responding  to  a  probability  change  of  0.01.  A  main  scale  division  occurred  at  every  fifth  small 
division  and  was  marked  0,  10,  20 .  90,  100.  The  lever  and  associated  mechanisms  con¬ 

tained  enough  Coulomb  friction  to  retain  a  setting  without  constant  force;  neither  springs  nor 
viscous  friction  was  used. 

The  position  of  the  lever  was  recorded  by  two  means,  Friden  punched  paper  tape  and  a 
Sanborn  continuous-strip  recorder.  The  paper  tape  was  punched  in  a  Grey,  or  cyclically  per¬ 
muted  binary  code  using  six  channels  of  an  eight- channel  punch  to  encode  101  symbols,  0  through 
100.  Pilot  studies  indicated  that  the  rate  of  output  sampling  necessary  to  recover  the  response 
information  depended  on  the  flash  rate,  and  that  a  response  sampling  rate  equal  to  the  flash 
rate  would  be  adequate.  Thus  a  sample  was  taken  every  two  seconds  at  the  slowest  presentation 
rate,  0.5  flash  per  second,  and  every  0.125  second  at  the  fastest  presentation  rate,  8  flashes  per 
second.  The  punched-tape  record  was  later  transferred  to  IBM  cards  on  a  modified  IBM  tape- 
to-card  converter,  and  the  data  analyzed  on  an  IBM  709  data  processing  system.  The  Sanborn 
records  were  used  in  making  qualitative  judgments  about  the  response  and  in  selecting  appro¬ 
priate  criteria  for  the  computer  analysis.  They  also  permitted  continuous  monitoring  of  the 
task  as  it  was  performed. 

The  subject  and  his  console  were  isolated  in  a  small  room.  The  subject  wore  noise-insu¬ 
lating  ear  muffs.  He  had  a  two-way  communication  system  with  the  experimenter.  A  low-level 
white  noise  was  presented  by  the  earphones  during  the  experimental  run.  When  the  experimenter 
spoke  to  the  subject,  the  noise  was  automatically  switched  off.  The  subject's  microphone  was 
always  on,  and  comments  during  the  experimental  run  were  permitted.1 

The  task  has  a  strong  resemblance  to  a  standard  unidimensional  manual  tracking  task. 

The  main  difference  is  the  presentation  of  the  target:  Instead  of  being  displayed  explicitly  as 
a  dot  or  a  line,  it  exists  only  as  a  parametric  description  of  the  method  used  to  select  the  flash 
sequence.  In  this  experiment  the  generating  process  was  time-variant,  and  the  target  could  be 
defined  as  the  mean  of  the  distribution  from  which  the  last  flash  was  drawn.  It  is  impossible  to 
recover  the  precise  target  from  the  information  available  to  the  subjects.  The  cursor,  or  0-100 

'Few  comments  were  made;  most  of  these  were  not  printable. 
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dial  pointer  in  this  case,  is  pointed  at  an  estimated  value  of  the  target.  The  system  is  essentially 
open-loop,  since  the  lack  of  an  explicit  target  prohibits  the  formation  of  an  error  signal.  The 
dynamics  are  almost  entirely  in  the  mental  computation;  there  was  no  indication  that  motor 
skill  was  a  limiting  factor. 

The  use  of  a  tracking  lever  as  a  response  means  is  unique  in  research  on  probability  esti¬ 
mation.  It  is  appropriate  to  the  task  and  permits  an  easy  understanding  of  the  response  scale 
by  the  subjects.  Both  end  points  are  well  fixed,  in  the  same  sense  that  Impossible  events  and 
sure  ones  are  fixed  in  value  on  a  personal  or  subjective  probability  scale.  The  50  point  on  the 
scale  might  also  be  considered  as  an  anchor  point,  since  all  subjects  clearly  understood  that 
50%  meant  equally  frequent  flashes. 

2.2.  INPUT  SELECTION 

The  input  probability  changed  in  a  series  of  discrete  steps.  This  input  form  permitted  vis¬ 
ual,  qualitative  interpretations  to  be  made  from  the  data  in  addition  to  the  more  extensive  anal¬ 
ysis  done  by  the  computer.  The  input  form  also  permitted  static  as  well  as  dynamic  meas¬ 
urements  to  be  made.  The  step-change  sizes  and  their  directions,  as  well  as  the  number  of 
flashes  between  steps,  were  selected  randomly  from  a  finite  set  of  values.  The  sequence  of 
steps  so  generated  is  called  a  problem.  The  mechanism  for  the  generation  of  the  sequences 
is  described  in  detail  in  Appendix  A. 

Preliminary  investigations  revealed  that  step  changes  ranging  from  0.06  to  0.64  in  eight 
values  would  adequately  cover  the  Interesting  range  of  probability  change.*  The  number  of 
flashes  between  step  changes  was  selected  from  a  set  ranging  from  34  to  80  flashes;  the  smallest 
number  of  flashes  required  to  minimize  interaction  between  successive  step  changes  was  34. 

The  range  between  34  and  80  was  considered  sufficient  to  prevent  any  performance  improvement 
due  to  the  learning  of  inter-step  length.  A  step  change  and  the  flashes  until  the  next  change  are 
called  a  subproblem. 

2.3.  FLASH  SERIES  GENERATION 

The  flashes  were  drawn  from  finite  populations  without  replacements.  The  population  size 
was  an  experimental  variable  and  is  discussed  below.  Finite  populations  were  selected  to  fix 
the  average  value  of  the  flashes  for  each  subproblem.  The  effects  of  finite  population  sampling 
on  variances  are  shown  in  Appendix  B. 


*A  pilot  experiment  with  a  simplified  apparatus  was  run  before  the  main  console  was  built 
in  order  to  establish  the  general  form  of  the  response  and  reasonable  ranges  for  the  Independent 
variables.  It  is  discussed  in  more  detail  in  Section  3. 
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2.4.  EXPERIMENTAL  VARIABLES 

Five  Independent  variables  were  used  in  the  experiment:  the  rate  at  which  the  Hashes  were 
presented;  the  magnitude  and  sign  ot  each  step  change;  the  probability  after  the  step  change;  a 
constraint  on  the  randomness  of  the  flash  series;  and  subjects.  The  number  of  flashes  between 
step  changes  was  not  studied  as  a  variable.  The  variables  had  the  following  values: 

Rate:  0.5,  1.0,  2.0,  4.0,  and  8.0  flashes  per  second  (fps) 

Step  size:  0.06,  0.12,  0.16,  0.18,  0.24,  0.32,  0.48,  0.64  (both  +  and  -) 

Probability:  0.02,  0.08,  0,14,  0.18,  0.26,  0.32,  0,34,  0.44,  0.50,  and  the  complementary 
values  between  0.50  and  1.00 

The  step  changes  and  probability  values  were  arranged  In  two  problem  types,  described  in  de¬ 
tail  in  Appendix  A.  For  one  type,  the  small-step  problems,  the  mean  step  change  Is  approxi¬ 
mately  0.15;  for  the  other,  the  large-step  problems,  the  mean  step  change  Is  approximately  0.40. 
Both  types  contain  the  entire  range  of  probabilities  and  are  symmetric  about  0.50. 

The  constraint  variable  had  two  values,  leading  to  the  random  and  the  constrained  problem 
types.  The  random  problems  were  generated  from  finite  populations  which  had  the  length  of  the 
respective  subproblems  being  generated.  These  finite  populations  were  thus  of  size  35  through 
89  flashes.  These  sizes  were  considered  large  enough  to  yield  experimental  results  fairly  close 
to  those  which  would  result  from  infinite  populations . 

The  constrained  problems  were  generated  from  finite  populations  of  17  flashes.  The  lengths 
of  the  subproblems  were  arranged  in  whole-number  multiples  of  17:  34,  51,  78,  and  85  flashes. 

It  was  assumed  that  this  constraint  would  be  sufficient  to  Indicate  those  aspects  of  performance 
that  constraint  would  affect.  It  Is  not  a  severe  enough  constraint  to  be  readily  perceived  from 
Inspection  of  the  flash  series,  however.  The  same  series  of  steps  and  probabilities  were  used 
in  the  random  and  in  the  constrained  problems. 

Each  of  the  four  subjects  performed  the  task  in  15  sessions,  and  saw  the  same  series  of 
problems  in  the  same  order.  Each  session  lasting  for  about  an  hour,  consisted  of  two  or  three 
problems  separated  by  a  short  rest  period. 

Rates,  small-  and  large-step  problems,  subjects,  and  constraints  were  exhaustively  com¬ 
bined.  The  order  of  presentation  was  chosen  at  random  under  the  constraint  that  the  tracking 
sessions  were  of  about  the  same  length.  (Appendix  C  gives  the  sequence  used.)  The  pilot  ex¬ 
periment  had  indicated  that  about  25  minutes,  at  two  flashes  per  second,  was  the  maximum 
time  that  a  subject  could  be  expected  to  track  without  a  significant  decrement  in  hts  perform¬ 
ance.  The  problems  presented  at  0.5  and  1.0  flashes  per  second  were  given  In  four  and  two 


8 


Institute  of  Science  and  Technology 


Tho  University  of  Michigan 


separate  sessions,  respectively,  In  order  to  limit  all  sessions  to  a  maximum  of  25  minutes  of 
continuous  tracking. 

2.4.  TASK  INSTRUCTIONS 

Careful  attention  was  paid  to  the  instruction  of  the  subjects  prior  to  the  recorded  experi¬ 
mental  sessions.  This  effort  was  repaid  by  an  excellent  consistency  in  the  tracking  behavior 
of  the  eight  subjects,  four  in  the  pilot  and  four  in  the  main  experiment.  (The  standard  instruc¬ 
tions  used  are  shown  in  Appendix  D.)  These  served  only  as  the  initial  formal  introduction, 
however.  Actually  about  10  minutes  was  spent  in  discussing  the  task  to  be  performed  and  the 
purpose  of  the  experiment.  Instruction  was  concluded  when  the  experimenter  was  satisfied 
that  all  important  concepts  were  understood. 

A  45-minute  practice  session  preceded  the  15  hours  of  data  recording.  During  this  session, 
the  response  was  continuously  monitored  and  the  subject  was  assured  of  the  quality  of  his  per¬ 
formance.  The  lack  of  error  feedback  made  it  difficult  for  the  subject  to  evaluate  his  own  per¬ 
formance  until  he  had  some  experience  with  the  task. 

The  instructions  were  as  complete  as  the  subject  seemed  to  need  in  all  but  one  important 
area.  He  was  told  nothing  about  the  dynamics  Of  the  input  sequence,  except  that  there  would  be 
changes  in  the  probability.  He  was  told  to  expect  both  rapid  and  slow  changes.  He  was  Instructed 
that  the  pay  he  would  receive  would  be  a  constant  rate  per  minute  of  tracking  minus  the  accum¬ 
ulated  squared  error  during  the  same  interval.  The  amount  was  computed  automatically  on  an 
analog  computer  operating  during  the  tracking  session.  (The  circuit  used  for  the  pay  scheme  is 
shown  in  Figure  2.) 


From 
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3 

EXPERIMENTAL  RESULTS 

3.1.  THE  PILOT  EXPERIMENT 

A  pilot  experiment  was  run  prior  to  the  main  experiment  In  an  attempt  to  answer  three 
questions.  The  first  concerned  the  general  form  and  quality  of  the  response.  The  responses 
found  were  qualitatively  similar  to  the  response  plotted  in  Figure  3.  Both  the  response  to 
change  and  the  estimation  of  probability  were  better  than  expected. 


FIGURE  3.  A  TYPICAL  RESPONSE  TO  A  SUBPROBLEM 

The  second  question  answered  by  the  pilot  experiment  concerned  changes  in  response  with 
continued  performance  of  die  task,  reflecting  learning  or  fatigue.  One  problem  was  presented  to 
the  four  subjects  In  each  of  six  sessions  about  two  days  apart.  There  was  no  indication  of  a 
significant  change  In  performance  after  the  first  session.  To  test  for  specific  problem  learning, 
the  problem  which  had  been  presented  for  six  sessions  was  presented  again,  but  backwards.  No 
decrement  In  performance  was  observed.  It  was  concluded  that  no  specific  problem  learning 
had  occurred.  None  of  the  subjects  recognized  that  the  problem  had  been  the  same  In  each  of 
the  six  sessions,  nor  were  they  able  to  describe  the  changes  In  the  probabilities.  Tracking 
sessions  up  to  IS  minutes  caused  no  particular  fatigue  or  boredom,  and  it  was  concluded  that 
sessions  of  26  minutes  would  be  permissible  on  the  more  Isolated,  Impressive,  and  comfortable 
console  used  In  the  main  experiment. 

The  third  question  answered  by  the  pilot  experiment  concerned  the  kind  and  amount  of 
Instruction  needed  to  bring  the  subjects  up  to  a  reasonably  consistent  level  of  performance. 

The  Instructional  method  described  in  Section  2  was  the  result.  The  subjects  in  the  main  experi- 
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mont  performed  consistently  after  the  Instruction  and  practice.  (Appendix  E  presents  two 
interesting  exceptions.)  The  Important  task  learning  evidently  occurs  during  the  first  few  minutes 
of  performance,  and  the  45-mtnute  practice  session  was  sufficient. 

3.2.  RESPONSE  MEASURES 

The  response  measures  were  chosen  after  study  of  the  Sanborn  records  from  the  main 
experiment.  The  form  of  these  responses  was  the  same  as  that  In  the  pilot  experiment  (shown 
In  Figure  3).  The  response  was  characterized  by  fairly  rapid  changes  separated  by  periods  of 
little  or  no  change.  This  discontinuous  form  Indicates  that  the  behavior  might  be  described  in 
terms  of  a  series  of  decisions  concerning  changes  In  the  probability.  A  descriptive  model  with 
this  characteristic  Is  developed  In  Section  4.  Several  of  the  response  measures  were  chosen  to 
fit  this  response  form.  All  of  the  response  measures  refer  to  Individual  subproblems.  The 
following  measures  were  calculated. 

(a)  DETECTION,  D:  the  number  of  samples  from  the  step  change  to  the  point  where  the 
response  has  changed  0.05  in  the  direction  of  the  new  probability  from  its  value  at  the  point  of 
the  step  change.  If  R„  Is  the  response  at  potnt  n  In  a  subproblem  which  starts  at  n  »  1,  the  point 
of  detection  ts  the  point  where  R„  -  Rg  ±  0.05  (the  plus  sign  Indicating  an  Increasing  step  and 
the  minus  a  decreasing  step). 

(b)  NO  DETECTION,  D:  the  number  of  subproblems  In  which  detection  did  not  occur;  that 
Is,  R„  never  came  to  within  0.05  of  the  new  probability. 

(c)  CONVERGENCE,  C:  the  number  of  samples  from  «be  step  change  to  the  point  where 
the  response  Is  within  0.05  of  the  new  probability.  The  point  of  convergence  is  that  at  which 

Rn  -  P  *  0.05,  where  P  is  the  probability  following  the  step  change.  The  point  of  convergence  Is 
the  first  entry  Into  this  region  from  either  side. 

(d)  NO  CONVERGENCE,  C:  the  number  of  subproblems  In  which  convergence  did  not  occur; 
that  Is,  Rr  was  always  outside  the  0.05  region  about  P. 

(e)  INITIAL  CONVERGENCE,  IC:  the  number  of  subproblems  In  which  the  response  was 
within  the  convergence  region  about  the  new  probability  at  the  point  of  the  step  change.  P  -  0.05 
i  R0  2  P  +  0.05. 

(f)  FOOT  MEAN  SQUARE  ERROR,  RMSE:  the  square  root  of  the  mean  square  error  over 
the  entire  subproblem.  Error  equals  the  response  minus  the  probability.  The  response  was 
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meaaured  on  a  0  to  1  scale  corresponding  to  the  probability  measure,  and  the  error  can  thus 
be  considered  an  error  in  probability.  For  a  subproblem  of  length  M, 
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(g)  ROOT  MEAN  SQUARE  ERROR  AFTER  C,  RMSEC:  the  square  root  of  the  mean  square 
error  from  the  point  of  convergence  to  the  end  of  the  subproblem. 
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This  measure  and  the  following  two  were  made  only  when  either  convergence  or  initial  conver¬ 
gence  was  measured. 

(h)  MEAN  ERROR  AFTER  C,  ME^:  the  mean  error  from  the  point  of  convergence  to  the 
end  of  the  subproblem. 
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(1)  FALSE  ALARM  RATE,  FAR:  the  number  of  times  per  sample  that  the  response  left 

the  0.05  convergence  region  between  the  point  of  convergence  and  the  end  of  the  subproblem. 

IfP-  0.05  SR  .  iP  +  0.05,  and  R  <  P  -  0.05  or  R  >  P  +  0.05,  then  the  point  n  would  be  a 
n-l  n  n 

false  alarm  point. 

Detection  and  convergence  were  measures  designed  to  describe  the  discontinuous  response 
form.  The  0.05  criterion  used  in  these  measures  was  selected  after  an  extensive  study  of  the 
data.  In  about  80%  of  the  subproblems,  a  sudden  response  to  the  new  probability  occurred 
shortly  after  a  step  change.  This  movement  was  Interpreted  to  be  the  result  of  the  perception 
of  the  change  in  the  probability.  The  0j05  detection  criterion  was  selected  as  measuring  this 
point  with  fair  consistency.  For  step  changes  greater  than  about  0.15,  this  measure  is  relatively 
insensitive  to  the  choice  of  the  0.05  criterion  since  the  sudden  response  was  characteristically 
0.10  or  greater. 

Convergence  is  more  dependent  on  the  selection  of  0.05  as  a  criterion.  The  point  of  con¬ 
vergence  was  most  useful,  however,  in  determining  the  beginning  of  measures  7,  8,  and  9.  These 
measures  were  all  averaged  over  flashes,  and  the  location  of  the  convergence  point  did  not 
affect  their  values.  Detection  and  convergence,  as  measured  with  the  0.05  criterion,  are  not 
particularly  informative  for  the  smallest  stop  change,  0.06. 
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Measures  7,  8,  and  9,  the  three  starting  at  the  point  of  convergence,  indicate  the  subject's 
static  estimation  ability.  The  subject  is  operating  under  what  might  be  called  a  dynamic  set, 
however;  that  is,  he  has  an  expectancy  for  changes  in  the  probability.  Changes  in  his  responses 
during  this  period  could  be  called  microstructure  tracking,  since  the  subject  was  not  aware  that 
the  probability  was  constant.  No  measures  were  made  of  the  persistence  of  this  microstructure 
tracking  on  the  longer  subproblems.  This  behavior  might  begin  to  diminish  with  long  presenta¬ 
tions  of  a  constant  probability. 

RMSE  was  the  only  measure  made  on  all  subproblems  regardless  of  their  response  form. 

It  indicates  the  overall  quality  of  performance.  RMSE  is  a  common  measure  in  continuous  tasks 
of  this  kind,  largely  because  it  is  easily  derived  and  manipulated  in  mathematical  expressions. 


3.3.  DATA  ANALYSIS 

There  was  one  subproblem  for  each  rate,  step  size,  step  direction,  probability,  constraint, 
and  subject,  3440  in  all.  The  combinations  of  variables  presented  here  were  judged  to  be  the 
most  informative  set  among  the  total  available  from  the  computer  analysis.  These  quantitative 
performance  measures  were  the  intended  output  of  this  experiment,  and  since  no  testable 
hypotheses  were  generated,  no  tests  of  statistical  significance  were  made. 


3.4.  EXPERIMENTAL  DATA 

3.4.1.  DIFFERENCES  BETWEEN  SUBJECTS.  No  qualitative  differences  existed  among 
the  four  subjects  used  in  the  main  experiment.  The  four  subjects  in  the  pilot  experiment  be¬ 
haved  similarly  to  those  used  in  the  main  experiment  and  to  each  other.  Inspection  of  the  data 
indicated  that  for  general  performance  information,  it  would  be  best  to  average  the  data  over 
subjects.  (Appendix  F  presents  some  of  the  subject-by -subject  data.) 

3.4.2.  DETECTION,  D.  Figures  4  through  7  show  the  effects  of  the  independent  variables 
on  detection.  Since  the  data  on  step  direction  show  no  appreciable  difference  between  positive 
and  negative  directions,  they  are  averaged  together  in  all  figures.  The  interaction  of  step  size 
and  rate  shown  in  Figure  4  shows  the  most  interesting  relation  found.  Here  detection  decreases 
fairly  linearly  with  step  size  and  increases  fairly  linearly  with  the  logarithm  of  rate. 

The  linear  increase  in  detection  with  the  logarithm  of  the  rate  probably  reflects  a  combina¬ 
tion  of  factors  influencing  the  response.  A  small  linear  Increase  with  rate  would  be  caused  by 
a  constant  reaction  and  movement  time.  For  the  usual  tracking  tasks,  this  might  be  expected 
to  be  on  the  order  of  0.S  seconds  and  to  yield  a  lag  of  2  flashes  at  4  fps  and  4  flashes  at  8  fps. 
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FIGURE  4.  DETECTION  A8  A  FUNCTION  OF 
STEP  SIZE  AND  FLASH  RATE.  Small-  and 
large-step  problems  are  platted  separately, 
the  small  extending  from  0.06  to  0.24,  and  the 
large  from  0.16  to  0.64.  Detection  la  meas¬ 
ured  In  flashes. 


FIGURE  6.  PERCENTAGE  OF  SUBPROB- 
LElfS  IN  WHICH  "NO  DETECTION"  OCCURS, 
AS  A  FUNCTION  OF  STEP  SIZE  AND  FLASH 
RATE.  The  small-  and  the  large-step  prob¬ 
lems  are  plotted  separately,  the  small  extend¬ 
ing  from  0.06  to  0.24  and  the  large  from  0.16 
to  0.64. 


The  more  Important  factor  la  probably  a  change  In  the  method  of  performing  the  task  as  rate 
changes.  At  rates  of  0.5  and  1  fps  the  subjects  reported  counting  the  flashes  at  times,  occasion¬ 
ally  counting  the  number  of  flashes  of  the  lower  frequency  and  comparing  this  to  an  estimate 
of  the  total  number  of  flashes.  They  did  not  use  any  procedure  of  this  sort  consistently,  however, 
at  least  not  one  apparent  to  them.  They  all  reported  that  the  rate  of  2  fps  was  the  most  difficult. 
Evidently  the  methods  which  they  had  used  effectively  at  0.5  and  1  fpe  became  difficult  If  not 
Impossible  at  2  fpe.  Beginning  at  4  fps  it  Is  clearly  Impossible  to  respond  to  separate  flashes 
and  the  series  Is  probably  perceived  In  groups  of  flashes.  The  task  becomes  similar  to  a  con¬ 
tinuous  tracking  task  at  these  rates.  Reese  [11]  postulated  that  subjects'  mechanism  for  counting 
light  flashes  would  change  at  about  4  flashes  per  second. 

Figure  4  shows  an  effect  due  to  the  presentation  of  the  step  changes  In  two  separate  series, 
the  small-  and  toe  large-step  problems.  There  Is  a  region  of  overlap  In  step  size  between 
these  two  problems.  The  smallest  change  in  the  large-step  problem  is  0.10,  and  toe  largest  In 
toe  small-step  problem  Is  0.24.  In  this  overlapping  region  the  small-step  problem  yields  detec¬ 
tions  of  from  one  to  six  flashes  higher  than  the  large-step  problem  at  all  rates.  The  subjects 
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FIGURE  6.  DETECTION  AS  A 
FUNCTION  OF  PROBABILITY, 
CONSTRAINT.  AND  SMALL- 
AND  LARGE-STEP  PROBLEMS. 
Detection  la  measured  In  flashes. 


FIGURE  7.  DETECTION  AS  A  FUNCTION 
OF  STEP  SIZE,  SAMPLE  RATE,  AND  CON¬ 
STRAINT.  The  small-  and  large-step  prob¬ 
lems  are  plotted  separately,  the  small  ex¬ 
tending  from  0.06  to  0.24  and  the  large  from 
0.16  to  0.64.  Detection  Is  measured  In 
flashes. 


were  evidently  modifying  their  tracking  method  according  to  the  type  of  problem  being  presented. 
The  large-  and  small-step  problems  were  ordered  randomly,  of  course,  and  the  subjects  had  no 
prior  Indication  that  there  were  two  problem  types.  This  change  Is  perhaps  not  surprising 
considering  the  difference  between  the  two  problem  types.  The  average  step  changes  were  0.15 
In  the  small-step  problem  and  0.40  In  the  large-step  problem.  Step  changes  of  about  0.30  and 
larger  are  readily  noticed. 

The  subjects  appear  to  have  made  larger,  more  decisive  .’esponse  changes  on  the  large- 
step  problem  than  on  the  small-step  problem.  This  more  responsive  behavior  Is  appropriate  In 
quickly  reducing  the  large  errors  following  the  larger  step  changes. 

Figure  5  shows  the  percentage  of  "no  detections"  for  the  total  number  of  subproblems,  as 
a  function  of  step  size  and  rate.  About  90%  of  the  "no  detections"  occurred  with  the  combination 
of  rate  above  4  fps  and  step  size  below  0.15.  Some  of  the  "no  detections"  were  probably  caused 
by  lapses  of  attention.  At  4  fps  a  42-flash  subproblem  Is  over  In  11  seconds.  In  25  minutes  of 
continuous  tracking  a  few  11 -second  lapses  are  certainly  to  be  expected. 
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Figure  6  shows  the  etfect  of  probability  on  detection.  Perhaps  the  most  interesting  finding 
Is  that  detection  Is  not  appreciably  smaller  for  the  extreme  probabilities.  In  Section  4  it  will 
be  seen  that  responses  generated  by  simple  running  averages  produce  detections  which  are 
similarly- independent  of  probability. 

The  variability  among  a  set  of  detections  of  a  particular  step  stze  and  rate  will  depend  on 
the  probability,  however.  Detections  of  central  probabilities,  those  near  0.5,  will  have  more 
variability  than  detections  of  extreme  probabilities,  those  nearer  0  or  1. 

The  effect  of  the  flash  generation  constraint  on  detection  Is  shown  In  Figures  6  and  7.  Con¬ 
straint  has  no  particular  effect  on  average  detection,  but  like  probability  it  affects  the  variability 
of  a  set  of  detections.  The  constrained  problems  yteld  less  variable  detections. 

3.4.3.  CONVERGENCE,  C.  Figures  8  through  12  show  the  effects  of  the  Independent  vari¬ 
ables  on  convergence.  The  Interesting  effects  are  again  with  step  size  and  rate.  The  effect  of 
rate  on  convergence  is  similar  to  its  effect  on  detection;  a  linear  Increase  In  convergence  with 
the  logarithm  of  rate.  The  effect  of  Increasing  step  stze  is  to  Increase  convergence,  although 
the  Increase  Is  small.  The  number  of  flashes  between  detection  and  convergence  increases  as 
step  size  Increases.  This  probably  reflects  the  size  of  the  response  more  than  any  other  factor. 
Most  of  the  subproblems  show  a  response  successively  approaching  the  new  probability  rather 
than  one  that  overshoots. 

Convergence  shows  a  difference,  similar  to  that  noted  In  detection,  between  the  small-  and 
the  large-step  problems  In  the  region  of  overlapping  step  size. 

'No  convergence,"  expressed  as  a  percentage  of  subproblems,  Is  shown  in  Figure  9.  "No 
convergence"  remains  relatively  Insensitive  to  changes  In  step  size  except  for  the  largest  step, 
0.84,  where  It  is  zero.  It  is  approximately  10%  for  the  large-step  problem  and  12.5%  for  the 
small-step  problem.  "No  convergence"  rises  sharply  with  increasing  rate,  reaching  about  28% 
at  8  fps.  This  is  consistent  with  the  data,  which  show  convergence  equal  to  35  flashes  at  8  fps, 
about  the  length  of  the  shortest  subproblem. 

"Initial  convergence"  has  a  high  of  35%  for  a  step  change  of  0.08  and  goes  to  zero  for  steps 
of  0.48  and  0.84.  It  increases  slightly  with  rate  from  about  8  to  12%. 

The  relationship  between  probability  and  convergence  is  shown  In  Figure  11.  Convergence 
Is  relatively  insensitive  to  probability  as  was  detection. 

The  effects  of  constraint  on  the  sample  generation  are  shown  in  Figures  11  and  12.  Again, 
as  with  detection,  there  Is  little  If  any  effect. 
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FIGURE  8.  CONVERGENCE  AS  A  FUNC¬ 
TION  OF  STEP  SIZE  AND  FLASH  RATE. 
The  small-  and  large-step  problems  are 
plotted  separately,  the  small  extending 
from  0.06  to  0.24  and  the  large  from  0.16 
to  0.64.  Convergence  is  measured  In 
flashes. 


FIGURE  9.  PERCENTAGE  OF  SUB¬ 
PROBLEMS  IN  WHICH  "NO  CONVER 
GENCE"  OCCURS,  AS  A  FUNCTION 
OF  STEP  SIZE  AND  FLASH  RATE. 
The  small-  and  large-step  problems 
are  plotted  separately,  the  small  ex¬ 
tending  from  0.06  to  0.24  and  the 
large  from  0.16  to  0.64. 


FLASH  RATE  (fps) 


FIGURE  10.  PERCENTAGE  OF  SUBPROBLEMS  FIGURE  11.  CONVERGENCE  AS  A  FUNC- 
IN  WHICH  "ACCIDENTIAL  INITIAL  CONVER-  TION  OF  PROBABILITY,  CONSTRAINT, 

GENCE"  OCCURS,  AS  A  FUNCTION  OF  STEP  AND  SMALL-  AND  LARGE-STEP  PROB-  ) 

SIZE  AND  FLASH  RATE  LEMS,  Convergence  la  measured  In  flashes, 
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FIGURE  12.  CONVERGENCE  AS  A  FUNCTION  OF  STEP 
SIZE,  FLASH  RATE,  AND  CONSTRAINT.  The  small-  and 
large-step  problems  are  plotted  separately,  the  small  ex¬ 
tending  from  O.Ofl  to  0.24  and  the  large  from  0.16  to  0.64. 
Convergence  Is  measured  In  flashes. 


3.4.4.  ROOT  MEAN  SQUARE  ERROR,  RMSE.  This  measure  was  Introduced  to  provide  a 
single  overall  Indicator  of  the  task  performance.  The  most  Informative  variation  of  RMSE  is 
variation  with  rate,  plotted  In  Figure  13.  RMSE  Increases  linearly  with  rate  from  1  to  8  fps. 

It  Is  Interesting  to  evaluate  this  performance  measure  on  a  time  basis,  as  might  be  done  when 
the  estimation  must  take  the  shortest  time  possible.  Dividing  RMSE  by  fps  yields  values  of 
error-seconds  per  flash  which  decrease  as  rate  Increases,  going  from  0.134  at  1  fps  to  0.022 
at  8  fps.  This  decrease  might  well  continue  with  even  higher  rates,  as  the  task  becomes  the 
tracking  of  the  relative  brightness  of  the  lights.  Either  the  limitations  on  the  judgment  of  rela¬ 
tive  brightness  or  simple  reaction  time  would  finally  limit  the  performance.  This  performance 
Index  must  be  viewed  with  caution.  The  error  Itself  has  a  meaningful  upper  bound  at  the  level 
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FIGURE  13.  ROOT  MEAN  SQUARE  ERROR  OVER  THE  WHOLE 
SUBPROBLEM  AS  A  FUNCTION  OF  FLASH  RATE 


where  the  lever  Is  left  stationary  or  Is  moved  in  some  manner  Independent  of  the  flashes.  As 
this  error  level  is  approached,  further  increases  In  rate  would  continue  to  decrease  the  index  of 
error-seconds  per  flash  but  the  index  would  have  little  meaning. 

The  following  three  measures  were  made  from  the  point  of  convergence — or,  if  initial  con¬ 
vergence  occurred,  from  the  beginning  of  the  subproblem — to  the  end  of  the  subproblem.  They 
are  therefore  measures  made  on  an  average  of  85  to  90%  of  all  of  the  subproblems  and  on  about 
95%  of  those  subproblems  with  step  changes  above  0.15  at  rates  below  4  fps. 

3.4.5.  MEAN  ERROR  AFTER  CONVERGENCE,  ME^.  The  mean  error  is  shown  as  a  func¬ 
tion  of  probability  in  Figure  14.  The  average  estimate  Is  essentially  unbiased  at  all  probabilities. 
The  largest  error  Is  smaller  than  the  least  scale  division  on  the  subject's  response  indicator, 
0.02.  Mean  error  was  not  significantly  affected  by  rate,  constraint,  step  size,  or  subjects. 

This  finding  contradicts  a  body  of  conjecture  based  in  part  on  the  results  of  static  estima¬ 
tion  and  choice  experiments.  Neither  the  overestimation  of  high  nor  the  underestimation  of 
low  probabilities  appears.  The  excellence  in  static  estimation  was  undoubtedly  due  at  least  in 
part  to  the  two  distinctive  features  of  the  task,  the  dynamic  estimation  and  the  use  of  the  track¬ 
ing  lever  as  the  response  mechanism. 
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PROBABILITY 

FIGURE  14.  MEAN  ERROR  AS  A  FUNCTION  OF  PROBABILITY.  This  measure 
Is  made  from  convergence  to  the  end  of  the  subproblem. 

3.4.6.  ROOT  MEAN  SQUARE  ERROR  AFTER  CONVERGENCE,  RMSEC>  RMSEC  Is  shown 
In  Figures  15  through  17.  The  only  independent  variable  not  affecting  RMSE^  is  step  size.  This 
indicates  that  the  period  after  the  point  of  convergence  is  not  affected  by  step  size.  The  con¬ 
straint  on  the  generation  of  the  flashes  reduced  the  RMSEg  by  about  0.014  and  does  not  appear 
to  Interact  with  either  step  size  or  rate.  RMSE_  decreases  with  increasing  rate  from  0.5  to 

2  fps  and  thereafter  remains  relatively  constant  Considered  together  with  the  data  indicating 
smaller  detection  values  at  the  lower  rates,  it  is  highly  probable  that  the  number  of  decisions 
concerning  changes  in  the  probability  on  a  per  flash  basis  is  highest  at  the  lowest  rate.  Thus 
the  additional  decision  time  available  at  the  lower  rates  permitted  smaller  detection  values  but 
resulted  in  larger  RMSE^  when  the  probability  was  constant. 

The  effect  of  probability  on  the  RMSEg  is  shown  in  Figure  17.  The  "random”  problems  are 
consistently  higher  than  the  "constrained"  problems  at  all  probabilities.  The  N  -  17.3  line  is 
the  RMSE,  or  standard  deviation,  of  a  17.3  flash  average.  The  subject's  response  is  about 
this  good  or  better  at  all  probabilities. 

3.4.7.  FALSE  ALARM  RATE,  FAR.  The  number  of  false  alarms  per  flash  is  shown  in 
Figures  18  through  20.  Its  behavior  is  similar  to  RMSE^,.  It  is  similarly  Insensitive  to  the 
size  of  die  step  change.  Increasing  rate  causes  a  decrease  in  FAR  up  to  4  fps  with  an  apparent 
leveling  off  above  4  fps.  These  data  lend  additional  support  to  the  hypothesis  concerning  an 
increase  in  number  of  decisions  per  flash  at  the  lower  rates.  False  alarms  can  be  considered 
as  indicating  decisive  changes  in  the  estimate. 

'variation  among  subjects  was  high  for  4  and  8  fps.  See  Figure  28,  Appendix  F. 
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FIGURE  IS.  ROOT  MEAN  SQUARE  ERROR  AS  A 
FUNCTION  OF  8TEP  SIZE  AND  CONSTRAINT. 
This  measure  is  made  from  convergence  to  the  end 
of  the  subproblem. 


FIGURE  16.  ROOT  MEAN  SQUARE  ERROR  AS  A 
FUNCTION  OF  FLASH  RATE  AND  CONSTRAINT. 
This  measure  is  made  from  convergence  to  the  end 
of  the  subproblem. 
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FIGURE  17.  ROOT  MEAN  SQUARE  ERROR  AS 
A  FUNCTION  OF  PROBABILITY  AND  CON¬ 
STRAINT.  This  measure  is  made  from  conver¬ 
gence  to  the  end  of  the  subproblem.  The  stand¬ 
ard  deviation  for  a  17.3  sample  mean  Is  shown 
for  the  random  problem. 
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FIGURE  18.  FALSE  ALARM  RATE  IN  FALSE  ALARMS 
PER  FLASH  AS  A  FUNCTION  OF  STEP  SIZE  AND  CON¬ 
STRAINT.  This  measure  Is  made  from  convergence  to 
the  end  of  the  subproblem. 
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FIGURE  19.  FALSE  ALARM  RATE  IN  FALSE  ALARMS 
PER  FLASH  AS  A  FUNCTION  OF  FLASH  RATE  AND 
CONSTRAINT.  Thl«  insecure  Is  made  from  converge noe 
to  the  end  of  the  suhproblem. 


FIGURE  20.  FALSE  ALARM  RATE  IN 
FALSE  ALARMS  PER  FLASH  AS  A  FUNC¬ 
TION  OF  PROBABILITY  AMD  CONSTRAINT. 
The  measure  la  made  from  convergence  to 
the  end  of  the  aubproMem. 
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FAR  remain*  constant  over  th*  entire  probability  range  with  the  exception  of  the  extreme 
value*,  0,02  and  0.98.  These  probabilities  were  usually  estimated  as  0  or  1,  with  an  excursion 
away  from  0  or  1  only  after  an  occurrence  of  the  Infrequent  flash.  Since  FAR  did  not  change 
with  probability,  it  appears  that  the  rate  of  "decisive"  movements  (greater  than  0.08  from  the 
probability)  remained  constant  for  all  probabilities.  The  reduction  in  RMSEC  as  the  probability 
tends  to  extreme  values  therefore  indicates  that  the  time  spent  at  these  "erroneous"  estimates 
decreased  with  extreme  probabilities.  This  hypothesis  is  supported  by  observations  made 
during  the  tracking  sessions.  The  lever  movements  appeared  larger  although  less  frequent  at 
the  more  extreme  probabilities.  The  increase  in  magnitude  evidently  compensated  for  the 
decrease  in  frequency  to  maintain  the  FAR  at  a  constant  level. 

The  constrained  series  produced  a  slightly  higher  false  alarm  rate  than  the  random  series. 
The  constrained  series  has  a  greater  number  of  runs  of  right  or  left  flashes  and  would  be  ex¬ 
pected  to  yield  a  higher  decision  rate.  All  of  the  FAR  data  will  be  dependent  on  the  false  alarm 
criterion  level.  A  larger  criterion  could  well  reverse  the  constraint  finding,  for  example,  since 
the  random  series  probably  produces  larger  decision  movements  than  the  constrained  series. 

3.8.  SUMMARY  OF  RESULTS 

The  response  to  a  step  change  in  probability  can  be  described  in  three  regions:  the  period 
before  any  response  to  the  change,  before  the  point  of  detection;  the  period  before  the  conver¬ 
gence  on  a  new  estimate;  and  the  period  from  the  convergence  point  to  the  end  of  the  subproblem. 
These  regions  were  defined  mathematically  as  functions  of  probability  response  form  and 
somewhat  arbitrary  constants  in  order  to  achieve  a  complete  description  of  the  response. 

Detection  increases  with  increasing  rate  and  decreases  with  increasing  step  size.  The 
range  was  from  4  to  24  flashes  for  a  rate  range  of  0.8  to  8  flashes  per  second  and  a  step  size 
range  of  0.08  to  0.84.  Detection  was  approximately  nine  flashes  for  a  step  of  0.32  at  1  fps. 

Convergence  increases  with  both  rats  and  step  size.  The  range  was  from  11  to  38  flashes 
for  the  same  step  and  rate  ranges  stated  above.  Convergence  was  approximately  IS  flashes 
for  a  step  of  0.32  at  1  fps. 

Both  detection  and  convergence  were  independent  of  the  constraint  Imposed  on  the  genera¬ 
tion  of  the  flash  series.  Both  were  Independent  of  probability. 

After  the  point  of  convergence  the  average  estimate  was  unbiased  at  all  probabilities.  This 
unbiased  estimate  had  an  RMS  error,  or  standard  deviation,  of  about  0.06. 

The  overall  task  performance  was  measured  by  th*  RMS  error  throughout  the  subproblem. 
RMSE  increased  linearly  with  rate  from  0.138  at  1  fps  to  0.180  at  8  fps. 
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4 

MATHIMATtCAl  MO OHS 

Three  mathematical  models  will  be  derived  in  this  chapter.  Two  of  these  will  be  called 
normative,  since  the  purpose  for  their  derivation  is  to  provide  standards  with  which  to  compare 
the  data  presented  in  Section  3.  The  third  is  a  descriptive  model  designed  to  simulate  the  human 
performance. 

The  somewhat  arbitrary  forms  of  the  normative  models,  and  the  optimized  parameters 
used,  were  selected  to  provide  the  best  RMS  error  fit  to  the  various  inputs  used  in  the  prob¬ 
ability  tracking  task.  One  form  selected  is  a  constant  weighted  average  over  a  finite  number 
of  past  flashes.  The  simplicity  of  this  model  makes  it  ideal  for  intuitive  comparisons  with  the 
subjects’  performance.  The  number  of  flashes  in  the  running  average  is  selected  to  give  the 
best  ftt.  The  other  model  has  geometrically  decreasing  weight  for  flashes  extending  into  the 
past.  This  model  is  more  appealing  from  the  standpoint  of  response  to  the  step  inputs.  It  also 
corresponds  to  assumptions  often  made  concerning  the  human  immediate  memory  function.  The 
best  fit  is  found  by  selecting  the  appropriate  geometric  ratio. 

More  sophisticated  linear  models,  and  certainly  some  nonlinear  ones,  would  undoubtedly 
perform  this  task  with  a  lower  RMSE  than  the  two  models  selected.  The  value  of  more  complex 
models  for  providing  simple  standards  is  marginal,  however. 

The  descriptive  model  was  derived  from  thoughts  on  how  the  subjects  performed  the  task. 
Its  form  arises  from  the  qualitative  aspects  of  the  data  and  from  observations  of  the  subjects' 
behavior.  It  has  four  parameters,  which  are  adjusted  to  yield  a  minimum  RMSE  fit  to  a  sub¬ 
ject’s  response. 

The  normative  models  to  be  considered  have  the  form 

r(n)'£  Vn-1+1  (1) 

1-1 

where  r(n)  is  the  model's  response  or  output  at  the  point  n  in  the  sample  series,  and  w(  is  a 
weight  attached  to  the  sample  sn_l+J.  The  response  at  n  is  thus  the  weighted  average  of  the 
sample  at  n  and  its  N  -  1  Immediate  predecessors.  This  is  an  averaging  or  smoothing  model 
intuitively  appropriate  to  this  task.  It  is  limited  to  samples  at  and  prior  to  the  response  point, 
considering  only  a  finite  number  of  these,  and  is  therefore  physically  realizable,  is  not  a 
function  of  n  and  could  be  described  as  sample -invariant 

The  random  variables  8R  are  drawn  from  an  infinite  population  and  are  Independent  They 
have  values  0  or  1,  corresponding  respectively  to  left  and  right  on  the  subject's  display.  The 
probability  of  a  1  is  P,  and  the  probability  of  a  0  is  therefore  1  -  P. 
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When  the  N  samples  are  all  generated  from  a  static  distribution  described  by  the  probability 
P,  It  is  desirable  that  the  estimate  be  unbiased  or  that 

5T5H  =  P  (2) 

where  r(n)  is  the  expected  value  of  r(n),  an  ensemble  average.  This  simply  requires  that 

t=N 

£*.-*  <*> 

The  responses  of  the  two  model  forms  will  be  derived  for  a  subproblem  beginning  with 
n  *  1  as  the  first  sample  of  the  new  probability  and  ending  with  n  ■  M.  The  previous  probability 
will  be  Pj  and  the  subproblem  probability  Pj.  The  step  change  Is  therefore  P2  -  Pj.  For 
N  <  n  <  M  the  response  will  be  called  steady-state,  since  the  samples  are  all  from  a  static  dis¬ 
tribution.  For  1  <  n  <  N  the  response  will  be  called  transient. 

4.1.  A  MODEL  WITH  GEOMETRIC  WEIGHTING 

The  first  model  to  be  considered  has  a  weighting  function 

Wj  =  ar*”*  (4) 

where  a  and  r  are  constants  and  0  <  r  <  1.  This  function  assigns  geometrically  decreasing 
weights  to  the  samples.  Limiting  r  to  the  range  0  to  1  confines  the  function  to  one  assigning 
monotonlcally  decreasing  weights  to  samples  receding  from  n. 

The  value  of  N,  the  number  of  samples  Included  in  one  computation,  will  be  selected  as  a 

Ml 

number  large  enough  to  assure  the  relative  unimportance  of  the  weight  at  n  »  N,  ar  ,  com¬ 
pared  to  the  weight  at  n  »  1,  a.  This  merely  Implies  that  die  function's  memory  extends  smoothly 
to  the  point  of  essentially  complete  "forgetting."  The  exact  value  of  N  in  any  particular  model 
of  this  form  Is  relatively  unimportant  to  the  considerations  that  follow.  It  will  simply  be 
assumed  that 


rN<<l  (5) 

and  all  quantities  of  this  magnitude  will  be  dropped. 

Of  primary  interest  is  the  selection  of  r  to  produce  an  optimum  model,  that  is,  one  having 
the  least  mean  squared  error.  This  particular  measure  of  performance  was  the  same  one  used 
In  measuring  of  the  subject’s  performance  and  In  the  payoff  scheme. 
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The  constant  a  is  selsctsd  to  satisfy  Equation  3,  which  becomes 

Wf  1-1  2  K-l  l-rN 

V*  arl  ■  a  +  ar  +  arz  +  . . .  +  ar"  1  ■  aj — — «  1 

M 


a  ■  1  -  r,  r  <<1 

We  will  be  concerned  with  two  quantities:  r(n),  the  expected  value  of  r(n)  at  the  point  n, 
2 

and  or(n),  the  variance  of  r(n)  at  the  point  n.  These  are  ensemble  averages. 

For  the  expected  value  of  r  we  have  _ ' _ 


r(n)-E  £  wtsn.1+1 
J-l 


and  since  the  w  are  constant  over  the  ensemble, 


l-l 


For  the  step  function  input,  rjij  will  depend  on  Pj  and  P2  during  the  transient  phase  and  on  P2 
along  during  the  steady-state  phase. 


For  1  <  n  <  N  we  have 


(1  -  rn)  P2  +  rnP1 


.-P,  -  r  (P„  -  P.) 


For  N  <  n  <  M, 


r(n)  ■  P2 


For  the  variance  we  have  the  variance  of  the  sum  of  the  w.s  ,  ,  terms.  Since  the  s  are 

i  n-i+i  n 

Independent,  we  have  the  sum  of  the  variances  of  the  individual  terms 


o2(n)  ■  a2o  (n)  ♦  a2r2o (n  -  1)  +  a2r*o2(n  -  2)  +  .  . .  +  a2r2^*^oZ(n  -  N  +  1)  (12) 

*  0  8  0  I 
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2  2 

where  ate)  u  the  variance  of  the  sample  a.  Since  o.(n)  -  P(1  -  P),  where  P  Is  the  probability 
■  2  n  ■ 

with  which  s  was  generated,  o.(n)  will  be  a  constant  for  constant  P,  and  in  particular  It  will 

n  *2  2 
have  two  values  during  the  transient  phase,  -  P^(l  -  P^)  and  "  p2(*  *  p2)- 

For  the  transient  phase  we  then  have 

c2r(„)  -  (1  -  r)2  ol  +  fa  .  r)2  ALliJ  .  a  -  r)2  1L^5|  a2 

<l-r2)  2  L  (1  -  r2)  (1  -  r)J  1 


and  for  the  steady-state  phase, 


%(n) "  Sirri  ff2 


We  can  now  proceed  with  the  formulation  of  the  model's  performance  In  terms  of  Its  mean 
square  error.  The  error  during  the  transient  phase  can  be  written  as 

e(n)  »  r(n)  -  Pj 

-  [r(n)  -  p2]  +  [r(n)  -  r(n)]  (15) 

and  the  squared  error  Is  then 

e2(n)  »  [rfaj  -  PJ2  +  [r(n)  -  rfn)]2  +  2[rfc)  -  PJ  [r(n)  -  rfi»)]  ( 16 ) 


The  expected  value  of  the  squared  error  Is  then 


e2(n)  =  [r5»)  -  Pj]2  +  o2(n)  a?) 

since  [r(nj  -  Pj]  Is  a  constant  for  a  particular  n  and  E[r(n)  -  r(n)]  Is  0. 

The  average  value  of  this  mean  square  error  over  the  transient  phase  of  the  subproblem 

~^T' 

will  be  <e  >T,  representing  the  average  over  the  ensemble  and  also  over  samples  in  the  sub¬ 
problem.  We  will  then  have 


r  .  n»N  —k  ,  n-N  _  «  <  n«N  , 

s  •  r  r  ■  *  £  w  -  *}y  ’l" 

n«l  n-1  nSl 


28 


Institute  of  Science  and  Technology 


T h •  University  of  Michigan 


When  Equation  10  la  used,  the  first  term  on  the  right  side  of  Equation  18  becomes 
n»N  „  ,  n«N 


i  _  9  t  “  -  n 

i?£ 


n-l 


n-1 


<VP2>  r2 


N 


1  -  r 


rN  <<  1 


(18) 


When  Equation  13  la  used,  the  second  term  on  the  right  side  of  Equation  18  becomes 

n-N  „  ,  n-N 

£ 

n»l 


in  iM  1  r>  (1  -  r)  L2  2n  /  2  2  V| 

(TTTj  [°2-r  \  2  "  0 \)\ 

n»l  n-1 


(1  -  r)  „_2  r2  /_2  _2\ 

hTtf)  [No2-777  (^‘"i) 


(1  -  r) 
■  (TTr) 


2 

*2- 


r2  /  2  2\ 


N(1  -  r  ) 


,rN«l 


The  average  mean  square  error  during  the  transient  phase,  <e  >T,  Is  then 

(P’  "  P9)  /I  _  r\  r  o  _2 

- s  +  tr-~\  ,  . 

L  N(1  -  r  ) 


JT2C  '  l  '  V  r  .  (1  -  r)  _2  r  /_2  _2\ 

"  x — “777  ^  a2’^T 


(20) 


(21) 


The  average  mean  square  error  during  the  steady-state  phase  i*  simply  the  variance 


2 

cr(n),  is  given  by  Equation  14: 

x  $ttS  °2 

The  average  mean  square  error  over  the  whole  subproblem  is  then 

<2  _N  2  M  -  N 
^  SP  M  T  M  ^  SS 


(22) 


<VP2>  r2 


M 


1  -r‘ 


N  (1  -  r)  _2  r  l_2  _2\ 
W TrJ  2  2  " 


N(1  -  r  ) 

.2 


(P1  '  P2)2  r2  (1  -  r)  f*2  r2  /  2  2\ 

"  -s-  iTTi +  L 2  ’moT7)  l°2  * 


(23) 
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We  are  interested  in  the  performance  of  this  model  over  the  types  of  problem  given  to  the 
subjects.  The  average  mean  square  error  over  a  problem  is  given  by 

v2  „ 


r  ^-i 


I  <*M  *  Pl> 


1  -  r 


M 


l 


,2  r 

1  Mt(l  -  r2) 


(24) 


where  M(  is  the  length  of  subproblem  1,  T  is  the  total  problem  length  in  samples,  and  S  is  the 
total  number  of  subproblems. 


This  expression  can  be  simplified  by  making  the  following  assumptions  based  on  the  meth¬ 
ods  used  for  generating  the  problems  (Appendix  A).  The  were  selected  randomly,  with¬ 
out  replacement,  from  a  set  of  equally  frequent  values  and  assigned  to  the  subproblems.  The 
sum  of  (a^  -  a*  jj/Mj  therefore  approaches  zero  for  long  series  of  subproblems.  Similarly 
2  2 

the  term  o  ^/T  will  approach  simply  a  ^/S.  Equation  24  can  therefore  be  written  as 


<e  >, 


1 


Pl>2  + 


1  -  r  1  2 

rTrs2_,a‘ 

i=l 


(25) 


The  large-step  problem  had  values  of  |Pj  -  Pjl  of  0.16,  0.32,  0.48,  and  0.64,  occurring  in 
12,  10,  8,  and  6  subproblems  respectively,  yielding 


i  1-8  2 

if  (P^i  -  P/  -  0.00251 

2 

a  t  had  values  of  0.250,  0.224,  0.148,  and  0.020  occurring  in  6,  12,  10,  and  8  subproblems  re¬ 
spectively,  yielding 

sfv;.  0.162 

l-l 

T  was  2241  samples  and  S  was  36  subproblems.  The  average  mean  square  error  over  the  large- 
step  problem  is  then 

2 

<?>.  Q_  -  0.00251  -£-»  +  0.162  (2«) 

LSP  j  _  r2  1  +  r 

The  small-step  problem  had  values  of  |Pj  -  Pjl  of  0.06,  0.12,  0.18,  and  0.24,  occurring  in 
12, 10, 16,  and  12  subproblems  respectively,  and  yielding 


ij  (PM  -  p/  »  0.000465 
M 
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o  2  had  values  of  0.250,  0.246,  0.217,  0.192,  0.120,  and  0.085,  occurring  In  6,  10,  12,  10,  6,  and 
6  subproblems  respectively,  and  yielding 


0.194 


T  was  3000  samples  and  S  was  50  subproblems.  The  average  mean  square  error  over  the  small* 
step  problem  Is  then 

2 

<®r>ssp“  0,000465  -^_j  + 0,194  rrr  (27) 

1  -  r 


It  will  be  of  Interest,  for  comparative  purposes,  to  evaluate  this  model  for  the  case  In 
which  only  one  value  of  r  Is  used  for  both  the  large-  and  small-step  problems.  This  model  will 
be  called  nondiscriminating  In  Section  5.  In  this  case  the  sums  In  Equation  25  are  over  both 
problem  types,  with  T  being  equal  to  5241  samples  and  S  being  86  subproblems.  We  have 

1  M  5 

f  £  (Pl-1  *  Pt>  =  °-°°134 
1»1 

and 


l£  2 

»£bi" 

l-l 


0.181 


The  average  mean  square  error  over  the  large-  plus  the  small-step  problems  Is  then 

T  r2  1  -  r 

<e*>-  T  -  0.00134  — +  0.181  r— — ^ 
s+l  j  _  l  +  r 


(28) 


We  are  Interested  In  the  selection  of  an  optimum  value  of  r  for  these  three  problem  types. 
Equation  25  can  be  written  as 


<?>, 


+  v 


1  -  r 
1  +  r 


(29) 


where  k  and  v  are  the  constants  for  the  specific  problem  type.  The  minimum  of  this  function 
over  r  can  then  be  found  by  setting 


d  <e  >  j 
dr 


-2vr  +  (2k  +  4v)r  -  2v  A 

- ga~Y's - * 0 

(1  -  rT 


(30) 
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which  yields  two  roots 


k  +  2v 
1,  2  ‘  2v 


-|  1/2 


(31) 


The  minus  sign  yields  a  value  of  r  between  0  and  1  and  is  also  the  minimum. 

For  the  large-step  problem  Equation  31  gives  an  optimum  r  =  0.883.  Using  this  value  of 
r  in  Equation  26  we  have  a  corresponding  minimum  mean  square  error  of  0.0190. 

For  the  small-step  problem  Equation  31  gives  an  optimum  r  =  0.953.  Using  this  value  of 
r  In  Equation  27,  we  have  a  corresponding  minimum  mean  square  error  of  0.00931. 

For  the  large-  plus  small-step  problems  Equation  31  gives  an  optimum  r  »  0.915.  Using 
this  value  of  r  In  Equation  28,  we  have  a  corresponding  minimum  mean  square  error  of  0.0147. 


4.2.  A  MODEL  WITH  CONSTANT  WEIGHTING 

The  second  model  gives  a  constant  weight  to  each  of  N  samples;  that  is,  It  Is  a  simple 
averaging  model.  The  derivation  of  the  response  and  errors  for  this  model  will  parallel  that 
for  the  geometric  model,  and  some  of  the  detailed  explanations  will  be  omitted. 

The  weighting  function  Is 

wt  =  1/N  (32) 

where  N  Is  the  number  of  samples  In  the  average  and  the  weight  Is  1/N  to  satisfy  Equation  3. 
In  this  case  the  transient  response  will  be 

?W“SP2+V!pi  =  Pl+:\-ln  (33) 


and  the  steady-state  response 

*^  =  p2 


(34) 


The  variance  of  r(n)  during  the  transient  phase  will  be  the  variance  of  the  sum  of  N  terms, 
each  with  weight  1/N: 


%(") 


N  -  n  2  n  2 
N2  1  N2  2 


(35) 
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The  variance  of  r(n)  during  the  steady-state  phase  will  simply  be 

2  •! 

%<n)  =  -jT  (3«> 

Following  the  same  procedures  and  arguments  developed  in  the  derivation  of  Equations  15 
through  19,  we  have 

,  n=N  ,  ,  n=N  f  P,  -  P.  I2 

5j^trW-p2]  =n2]  LP1  +  N  n'P2_ 


(P2-P/n=N 


,  r /  V  r 

-n-LE(s^ 


and  analogous  to  Equation  20  we  have 


n»l  "  n-1 

1  2  N  +  1  /  2  2\ 


The  average  mean  square  error  during  the  transient  phase  Is  then 


71  m  D  >2  (l  1  1  \  12  N  +  1  /  2  2\ 

<e  T-(p2-P1)  (3  '  2N  +  — 2j  +  Nffi  ■'■--2-  (°2 -aij 


The  average  mean  square  error  during  the  steady-state  phase  Is  the  variance,  given  by  Equa¬ 
tion  36. 


B  SS  '  N 

The  average  mean  square  error  over  the  whole  subproblem  Is  then 

nr  N  ~T  M  -  N  T 
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As  with  the  geometric  weighted  model,  we  are  interested  in  the  performance  of  this  model 
over  the  problems  given  to  the  subjects.  The  average  mean  square  error  over  a  problem  is 
given  by 


i=SM.  (P, 


-L 

i=l 


l-l 


■v 


M, 


/N  I  .  _l\  1  -  W  (2 
\  3  *  2  6N/  2MjN  l  ' 


(42) 


Using  the  same  arguments  as  those  leading  to  Equation  25,  we  have 


<7>, 


_1/N  1 

’  T  \  3  ~  2 


(P. 


t-1 


(43) 


These  sums  are  the  same  as  those  calculated  for  the  geometric  weighted  model.  For  the  large- 
step  problem  we  have 

<7>  up  =  0.00251  (3-5  +  +  ©.M2  £  (44) 

For  the  small-step  problem, 

<7>gsp  =  0.000465  g  -  }  *  $  ♦  0.194  2  (45) 

And  for  the  large-  plus  small-step  problems, 

=  °-°°134  (?  -  ?  +  d?) +  °-181  s  (46> 


The  minima  can  again  be  selected  by  letting  k  and  v  be  the  constants  for  the  particular  problem 
type 


and  solving 


This  yields 


<7>, 


1 

2 


+ 


_l\  V 

6N/  +  N 


T 

d  <e  >  j 
dN- 


k 


N 


1/2 


(47) 


(48) 


(49) 


34 


liwtHut*  of  Scianc*  and  Technology 


Tho  University  of  Michigan 


For  the  large-step  problem  Equation  49  gives  an  optimum  N  »  14.1.  Using  this  value  of 
N  In  Equation  44  we  have  a  corresponding  minimum  mean  square  error  of  0.0220. 

For  the  small-step  problem  Equation  49  gives  an  optimum  N  =  35.3.  Using  this  value  of 
N  in  Equation  45,  we  have  a  corresponding  minimum  mean  square  error  of  0.0107. 

For  the  large-  plus  small-step  problems,  Equation  49  gives  an  optimum  N  =  20.1.  Using 
this  value  of  N  in  Equation  46  we  have  a  corresponding  minimum  mean  square  error  or  0.0172. 

4.3.  A  DESCRIPTIVE  MODEL 

Inspection  of  data  on  the  subjects'  response  shows  that  they  did  not  perform  the  estimation 
task  as  smoothly  as  the  two  normative  models.  The  responses  were  characterized  by  rapid 
adjustments  separated  by  periods  of  little  or  no  movement.  This  evidence,  together  with 
thoughts  on  how  this  task  might  be  performed,  led  to  the  postulation  of  the  following  model  as 
an  attempt  to  describe  the  human  performance. 

This  model  operates  as  follows.  The  subject  maintains  a  short  running  average  of  the 
previous  k3  flashes.  This  average  is  of  exactly  the  same  type  as  the  second  normative  model 
discussed  above.  At  each  flash  this  average  is  compared  with  the  existing  setting  of  the  re¬ 
sponse  lever  and  the  difference  noted.  If  this  dtstance  measure  Is  greater  than  a  prescribed 
criterion  level,  the  response  is  changed  to  a  new  value  at  some  point  intermediate  between  the 
old  response  and  the  running  average.  If  the  difference  is  less  than  the  criterion  level,  the 
response  remains  unchanged. 

Several  features  make  this  descriptive  model  attractive.  It  uses  the  lever  as  a  memory 
device,  moving  it  only  a  fraction  of  the  distance  to  the  new  average  and  thus  preserving  some  of 
the  information  in  the  previous  setting.  This  memory  function  permits  a  smaller  number  of 
flashes  in  the  running  average  than  would  otherwise  be  required  to  produce  the  levels  of  mean 
square  error  measured  from  the  subject's  responses.  The  criterion  level  corresponds  to  the 
concept  of  the  subjects'  smallest  perceptible  difference  between  the  running  average  and  the 
lever  position.  It  permits  the  response  to  remain  stationary  during  pertods  when  the  running 
average  deviates  only  slightly  from  the  response. 

This  model's  operation  can  be  thought  of  as  a  form  of  hypothesis  testtng.  At  each  flash  it 
Is  testing  the  hypothesis  that  the  running  average  is  from  a  population  described  by  the  re¬ 
sponse  lever  setting,  using  the  criterion  level  as  a  form  of  significance  measure.  The  subject's 
performance  is  thus  viewed  as  a  succession  of  decision  making  situations.  This  framework  ts 
appropriate  to  the  inclusion  of  more  higher  mental  processes  than  are  in  the  usual  manual 
tracking  task. 
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This  model  can  be  described  mathematically  as  follows: 


(50) 


where  u(n)  is  the  running  average  of  k,  flashes,  s  .  If  r(n)  is  the  current  lever  setting  and 

on 

|r(n)  -  u(n)l  <  kj  (51) 

where  is  the  criterion  band,  then 


r(n  +  1)  =  r(n)  (52) 

If,  however, 

|r(n)  -  u(n)|  >  k^  (53) 


then 

r(n  +  1)  =  r(n)  +  k2  [u(n)  -  r(n)]  ( 54) 

where  kj  is  the  fractional  lever  adjustment. 

A  fourth  parameter,  k^,  was  also  considered.  It  represents  a  time  (flash)  shift  between 
the  subject's  and  the  model's  responses.  The  subject's  response  at  n  was  compared  to  the 
model's  at  n  -  k^. 

The  four  parameters  are  constrained  to  the  following  ranges: 

0  <  kj  <  1  (55) 


where  0  yields  adjustment  decisions  at  each  sample  and  1  yields  no  adjustment  decisions. 

0  <  k2  <  1 


(56) 


where  0  yields  no  response  changes  and  1  represents  simply  the  following  of  the  running 
average  whenever  an  adjustment  decision  is  made: 

1  <  kg  <  K,  kg  an  Integer  ( 57) 


where  K  is  some  reasonable  maximum  number  of  flashes  that  the  subject  could  be  expected 
to  assimilate  in  one  averaging  calculation.  No  definite  values  for  K  are  known  for  this  task. 

It  is  certainly  reasonable  to  assume  that  the  flashes  are  not  simply  remembered  as  a  succession 
of  binary  symbols  but  are  encoded  into  a  larger  symbol  set,  perhaps  one  depending  on  the  lengths 
of  runs  of  one  of  the  binary  symbols.  Considering  the  nature  of  the  task  and  Its  difficulty,  it 
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would  seem  unlikely  that  more  than  20  flashes  could  be  used  In  an  averaging  calculation;  a 
value  closer  to  10  would  be  more  appropriate. 

There  are  no  particular  constraints  on  k^  except  that  k^  <  0  implies  subject  prediction  with 
respect  to  the  model. 

Once  the  form  of  the  model  has  been  thus  chosen,  the  task  Is  to  select  parameter  sets 
(kj,  kg,  kg,  k^)  which  will  make  the  model  best  describe  the  human  performance.  The  criterion 
used  for  this  selection  was  the  minimization  of  the  mean  square  error  between  the  subject's 
and  the  model's  responses  over  particular  problems.  This  measure  was  selected  as  providing 
the  best  signal  measure  of  performance,  as  was  mentioned  in  Section  2.  The  selection  of  the 
minimum  mean  square  error  for  the  criterion  assures  a  fairly  close  fit  to  the  transient  portion 
of  the  response  where  the  error  is  large,  at  the  possible  expense  of  fit  to  the  steady-state 
portion. 

The  actual  minimization  process  was  carried  out  as  follows.  The  model  was  programmed 
on  an  IBM  709  computer.  The  computer  was  then  fed  four  of  the  input  problems  used  in  the 
experiment  plus  the  responses  of  one  of  the  four  subjects  to  these  problems.  At  each  Bample 
point  the  squared  difference  between  the  subject's  and  the  model's  responses  was  calculated 
and  accumulated.  The  values  obtained  were  simply  printed  out  at  the  end  of  each  problem  - 
parameter  set  combination.  The  large  number  of  parameter  sets  possible  and  the  possibility 
of  numerous  minima  precluded  the  use  of  an  automatic  searching  technique  for  the  minima. 
Several  computer  runs  were  made  in  which  previously  selected  parameter  ranges  were  either 
extended  or  filled  in,  according  to  the  results  of  the  previous  run.  The  total  variation  of  the 
parameters  was  through  the  following  ranges. 

0.02  <  kj  <  0.20  (  6  values) 

0.10  <  k2  <0.90  (  7  values) 

1  <  kg  <  28  (12  values) 

-2  <  k^  <  4  (6  values) 

The  four  problems  Investigated  were  the  large-  and  small-step  problems,  random  constraint, 
at  1  and  4  fps.  The  subject  was  S-2. 

Several  parameter  sets  with  approximately  equal  minimum  error  measures  were  found 

for  each  problem  type.  In  each  case  these  minima  represented  either  a  valley  in  the  error 

function  or  fairly  distinct  minima  separated  by  regions  of  higher  error.  Table  I  shows  the 

2 

various  parameter  sets  and  their  corresponding  minimum  errors,  <eUQ>_.  In  each  group 

MS  “ 

of  parameter  sets  one  can  find  various  tradeoffs  among  the  parameters  which  yield  the  approx¬ 
imately  equal  error  measures. 


i 
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TABLE  I.  PARAMETER  SETS  FOR  THE  DESCRIPTIVE  MODEL  YIELDING 
MINIMUM  VALUES  OF  <ej'{S>p 


Fractional 

Criterion  Adjustment  Memory  Lag 


kl 

k2 

k3 

k4 

<®MS>  P 

"Vp 

<2eMCMS>P 

P 

Large  Step,  1  fps 

1. 

0.05 

0.20 

8 

0 

0.0104 

0.0185 

0.0030 

0.109 

2. 

0.12 

0.60 

12 

0 

0.0121 

0.0198 

0.0001 

0.003 

Small  Step,  1  fps 

1. 

0.05 

0.10 

6 

0 

0.0076 

0.0130 

-0.0066 

-0.332 

2. 

0.05 

0.20 

12 

0 

0.0079 

0.0140 

-0.0079 

-0.375 

3. 

0.08 

0.20 

10 

0 

0.0083 

0.0130 

-0.0073 

-0.351 

4. 

0.12 

0.20 

8 

0 

0.0086 

0.0130 

-0.0076 

-0.359 

5. 

0.15 

0.40 

10 

0 

0.0092 

0.0120 

-0.0072 

-0.343 

Large  Step,  4  fps 

1. 

0.05 

0.10 

8 

2 

0.0098 

0.0240 

0.0062 

0.209 

2. 

0.05 

0.10 

12 

0 

0.0098 

0.0210 

0.0092 

0.321 

3. 

0.10 

0.10 

8 

2 

0.0106 

0.0240 

0.0054 

0.170 

4. 

0.15 

0.10 

8 

2 

0.0117 

0.0280 

0.0003 

0.008 

5. 

0.15 

0.30 

12 

2 

0.0121 

0.0300 

-0.0021 

0.060 

Small  Step,  4  fps 

1. 

0.05 

0.10 

16 

2 

0.0102 

0.0140 

-0.0102 

-0.426 

2. 

0.05 

0.10 

24 

0 

0.0102 

0.0160 

-0.0122 

-0.480 

3. 

0.10 

0.10 

16 

1 

0.0102 

0.0140 

-0.0102 

-0.426 

4. 

0.10 

0.30 

24 

0 

0.0106 

0.0140 

-0.0106 

-0.418 

5. 

0.10 

0.50 

24 

1 

0.0106 

0.0150 

-0.0116 

-0.460 

The  following  method  was  devised  as  a  means  for  selecting  the  best  descriptive  model  from 

2 

among  these  parameter  sets  with  approximately  equal  <eIvjg>p- 


The  subject's  error,  eg(n),  can  be  written  as 


es(n)  =  eM(n)  +  eMg(n)  (58) 

where  eM(n)  is  the  model's  error  and  e^jg(n)  Is  the  error  between  the  subject  and  the  model. 
Squaring  this  error,  we  have 


4(n)  =  eM(n)  +  eMS(n)+2eM(n)eMS(n) 

The  average  value  of  this  squared  error  over  a  particular  problem  is  then 


<eS>P  =  <eM>P  +  <eMS>P  +  <2eMeMS>P 


(59) 


(60 
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The  minimization  process  used  to  select  the  parameter  sets  was  concerned  with  finding  mini- 

2  2 
mum  values  of  <ew„>.  The  computer  also  calculated  values  for  the  model's  error. 

2  IVIo  M  r 

<eg>p  was,  of  course,  one  of  the  measures  made  on  the  subject's  performance.  The  term 
<^eMeMS>P  can  there*ore  136  calculated  from  Equation  60. 

These  error  terms  can  be  interpreted  in  the  following  manner.  Consider  the  subject's 

error  at  any  point  in  the  sample  sequence  to  have  two  components,  one  dependent  in  some 

manner  on  the  actual  input  samples  and  the  other  on  phenomena  not  related  to  the  input.  The 

first  part  of  this  error  might  be  termed  coherent,  the  second  part  noise.  Consider  now  a 

descriptive  model  and  its  relationship  to  these  two  error  measures.  If  it  performs  the  task 

2 

exactly  as  the  subjects  do,  it  will  have  an  error,  <ej^>p>  which  is  equal  to  the  subject's 
coherent,  or  sample-dependent,  error;  the  error  between  this  model  and  the  subject,  <eA_>_ 
would  then  be  equal  to  the  subject's  noise  or  sample  independent  error.  The  subjects'  noise 
can  be  considered  random  fluctuations  in  the  response  about  the  coherent  value.  Since  <e^>p 
approaches  zero  over  a  large  set  of  subproblems,  then  <eMeMS>p  wil1  also  approach  zero. 

If,  on  the  other  hand,  the  model  does  not  represent  the  entire  coherent  part  of  the  subject's 
response,  that  is,  if  it  Is  not  a  complete  descriptor  of  the  subject’s  coherent  behavior,  then 
eMS(n)  will  be  partially  dependent  on  the  sample  series  and  therefore  will  be  correlated  with 
eM(n).  In  this  case  the  term  <2eMeMS>p  will  not  approach  zero.  This  correlation  can 
therefore  be  used  as  an  additional  selection  device.  It  can  be  written  in  the  normalized  form 


<eMeMS>P 
P  I  2  \ 

VeM  P  <eMS>pj 

2 

Table  I  shows  the  values  of  <eM>p.  <2eMeMS>P’  311(1  p' 


1/2 


(61) 


The  normalized  correlation,  p,  provides  a  measure  giving  good  discrimination  among 
the  parameter  sets  for  the  large-step  problems.  For  the  large-step  problem  at  1  fps,  param¬ 
eter  set  2  has  a  value  of  p  which  is  essentially  zero.  At  4  fps,  parameter  set  4  has  a  very 
low  value  for  p.  Neither  of  the  small-step  problems,  however,  produces  a  correlation  which 
discriminates  among  the  parameter  sets  or  which  is  as  small  as  that  found  for  the  large-step 
problem.  It  would  appear  on  the  basis  of  this  evidence  that  the  postulated  descriptive  model 
represents  the  subject's  performance  on  the  large-step  problems  better  than  on  the  small- 
step  ones. 


Zero  correlation,  as  defined  by  Equation  61 ,  does  not  necessarily  imply  a  complete  lack 
of  dependence  of  eMg(n)  on  the  sample  series.  Two  hypotheses  could  be  used  to  explain  the 
fairly  large  <e^s>p  which  remained  even  for  p  =  0.  One  would  simply  be  that  this  level  of 
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noise  did  exist  in  the  subjects'  performance.  The  other  would  be  that  this  "noise"  component 
had  at  least  some  portion  which  was  related  to  the  sample  series  but  which  was  not  correlated 
with  e^(n).  Perhaps  one  reason  for  a  fairly  large  noise  component  would  be  variations  in  the 
subjects'  method  of  performing  the  task  during  the  problem  run. 

4.4.  A  NORMATIVE  VARIATION 

The  descriptive  model  discussed  above  was  constructed  as  an  approximation  to  the  human 

performance  on  this  task.  It  is  also  interesting  to  see  how  well  this  model  form  can  do  if  the 

2 

parameters  are  selected  to  give  a  minimum  <e^>  pi  to  be  normative  in  the  same  sense  as 

the  models  with  geometric  and  constant  weighting.  Normative  parameter  sets  for  the  large- 

2 

and  small-step  problems  were  found  by  using  the  computer  to  calculate  <eM>p  and  converg¬ 
ing  on  the  minimum  value  by  successive  selection  of  the  parameter  sets  as  in  the  selection 
2 

of  the  minimum  <e^g>p  for  the  descriptive  models.  For  this  selection  k^  was  set  equal  to 
zero. 

The  large-step  problem  yielded  one  distinct  and  Interesting  minimum:  kj  «  0.02,  kg  = 
0.10,  and  kg  =*  1.  All  three  of  these  parameters  are  the  smallest  values  examined,  and  this 
minimum  is  in  one  corner  of  the  error  surface.  This  model  would  operate  as  follows:  with 
kg  »  1,  the  running  average  would  have  values  of  either  0  or  1,  depending  on  the  most  recent 
sample;  with  kg  =  0.02,  there  would  be  a  response  adjustment  at  every  sample  except  when 
the  response  was  within  0.02  of  either  0  or  1.  This  adjustment  would  be  0.10  of  the  distance 
between  the  previous  response  and  0  or  1.  RMS  error  for  this  model  was  0.124. 

The  best  normative  parameter  set  for  the  small-step  problem  was  found  to  be  k  j  *  0.20, 
kg  =  0.10,  and  kg  ■  8.  Again  we  have  the  minimum  occurring  at  the  smallest  value  of  kg,  but 
in  this  case  the  criterion  for  changing  the  response  is  fairly  high.  We  have  six  samples  in 
the  memory.  The  root  mean  square  error  for  this  model  was  0.099. 

Both  the  geometric  and  the  constant  weighted  models  are  Included  as  special  cat  !  this 
descriptive  model.  When  kg  •  1  and  k^  =  0  the  descriptive  model  is  identical  with  the  g. 
metric  weighted  model  with  r  -  1  -  kg.  When  k^  *  0  and  kg  =  1  the  model  is  identical  with 
the  constant  weighted  model  with  N  -  kg.  The  best  normative  parameter  set  for  the  large- 
step  problem  deviates  from  the  simple  geometric  form  only  when  the  response  is  within 
0.02  of  either  0  or  1.  The  best  normative  parameter  set  for  the  small-step  problem  does 
not  yield  as  low  an  error  as  the  optimum  of  either  the  geometric  or  the  constant  weighted 
model.  The  equivalent  parameter  sets  for  these  models  were  outside  the  parameter  range 
investigated,  however. 
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It  would  seem  that  this  decision  model,  within  Its  restricted  parameter  sets,  represents 
a  reasonable  method  for  performing  this  task  when  the  step  changes  are  large,  but  not  when 
they  are  small.  It  Is  Interesting  In  this  light  to  note  that  the  decision  model  did  not  seem  to 
describe  the  subject's  performance  on  the  small-step  problem  as  well  as  on  the  large-step 
problem. 

Figures  21  and  22  show  a  small  representative  portion  of  three  responses  to  the  same 
Input  sample  sequence.  The  normative  model  at  the  top  Is  the  parameter  set  selected  above 
as  the  best  normative  set  for  the  decision  model.  Note  the  rapid  response  changes  of  the 
large-step  model.  The  center  response  Is  that  of  two  of  the  descriptive  parameter  sets,  and 
the  lower  response  is  the  subject's.  The  descriptive  parameter  set  for  the  large-step  prob¬ 
lem  is  the  one  with  the  low  value  of  p.  The  set  for  the  small-step  problem  was  selected 
somewhat  arbitrarily  as  one  of  the  five  sets  that  seemed  like  a  reasonable  description.  The 
fairly  high  coherent  subject's  error  is  clearly  evident  in  these  figures. 


FIGURE  21.  RESPONSES  OF  TWO  MATHEMATICAL 
MODELS  AND  A  SUBJECT  TO  A  PORTION  OF  A 
LARGE-STEP  PROBLEM,  RANDOM  CONSTRAINT, 
AT  1  FPS.  Normative  Model  Ki  -  0.02,  K2  -  0.10, 
K3  -  1;  Descriptive  Model  Kj  -  0.12,  K2  »  0.60, 

K3  -  12,  K*  -  0. 
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FIGURE  22.  RESPONSES  OF  TWO  MATHEMATICAL  MODELS 
AND  A  SUBJECT  TO  A  PORTION  OF  A  SMALL-STEP  PROB¬ 
LEM,  RANDOM  CONSTRAINT,  AT  1  FPS.  Normative  Model 
Kj  -  0.20,  Kg  -  0.10,  Ks  -  6;  Descriptive  Model  Kj  -  0,15, 

Kj  -  0.40,  Kg  -  10,  K4  -  0. 


3 

DISCUSSION 

The  response  measures  presented  In  Section  3  do  not  directly  indicate  the  quality  of  the 
performance.  Quantitative  standards  are  necessary  for  the  measures.  Mean  error  is  an 
exception  in  that  a  standard  of  zero  is  reasonable  and  was  in  fact  achieved.  The  normative 
models  derived  in  Section  4  provide  the  standards  for  the  other  measures.  They  permit 
comparison  between  the  subjects'  performance  and  that  of  several  simple  machines. 

Several  Important  differences  exist  between  the  subjects'  and  the  models'  knowledge  of 
the  task.  The  subjects  were  not  Instructed  on  the  step-function  nature  of  the  input.  In  fact 
they  were  specifically  told  to  expect  slow,  continuous  changes  in  the  probability.  The  models, 
on  the  other  hand,  were  optimized  for  step  input  functions.  It  is  reasonable  to  assume, 
however,  that  the  subjects'  original  ignorant  and  misinformed  state  did  not  persist  for  long 
after  the  tracking  began.  The  rapid  performance  asymptote  (less  than  45  minutes)  and  the 
discrimination  between  the  small-  and  large-step  problems  attest  to  this.  The  model  does 
not  have  learning  and  adaptive  abilities,  of  course,  and  it  was  therefore  given  the  maximum 
knowledge  that  the  subject  could  theoretically  derive  from  the  task.  The  model-subject 
comparison  thus  includes  the  subjects'  learning  and  adaptive  abilities.  This  method  of 
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subject  Instruction  will  allow  more  valid  generalisation  of  the  measured  estimation  ability 
to  other  input  forms. 

The  same  situation  exists  In  the  relative  knowledge  of  the  input  statistics  possessed  by 
the  subjects  and  the  models.  The  models  were  completely  informed  of  the  distributions  of 
step  size,  step  direction,  subproblem  length,  and  probability.  The  subjects  knew  nothing  of 
these  initially,  but  it  can  again  be  assumed  that  they  learned  much  about  them  while  perform¬ 
ing.  The  adaptation  to  the  small-  and  large-step  problems  is  an  example  of  the  subjects' 
distinguishing  between  two  distributions  of  step  size. 

The  models  were  provided  with  a  definite  criterion  for  optimum  performance,  the  min¬ 
imum  mean  square  error.  The  subjects  were  Instructed  to  use  the  same  criterion.  The 
actual  criteria  used  by  the  subjects,  however,  correspond  to  their  conception  of  best  per¬ 
formance  and  are  a  function  of  the  instructions,  of  performing  the  task,  and  of  personal  abilities 
and  sensitivities. 

The  subjects  and  the  models  will  be  compared  by  means  of  the  following  measures: 
detection,  D;  convergence,  C;  root  mean  square  error,  RMSE;  and  root  mean  square  error 
after  convergence,  RMSEg.  These  four  measures,  plus  mean  error,  provide  a  fairly  complete 
description  of  the  performance. 

Detection  and  convergence  were  calculated  by  using  Equations  10  and  33.  These  measures 
are  for  the  expected  values  of  the  responses  and  are  not  the  expected  values  of  detection  and 
convergence.  The  difference  is  not  important  of  this  comparison. 

RMS  error  was  calculated  by  using  Equations  23  and  41.  RMS  error  after  convergence 
was  calculated  by  using  Equations  22  and  40  with  the  addition  of  a  correction  for  the  small 
error  contributed  by  the  remaining  transient  after  convergence.  This  transient  error  was 
calculated  by  using  Equations  18  and  37. 

Two  values  of  RMSEg  were  calculated,  one  with  an  infinite  sample  population  as  implied 
in  Section  4  and  one  with  finite  populations  such  as  those  used  in  the  experiment.  The  cor¬ 
rection  factor  for  the  finite  populations  is  derived  in  Appendix  B.  It  was  calculated  for  an 
average  number  of  flashes  for  the  subproblems  and  corresponds  to  the  random  problem  type. 

Two  specific  step  sizes  were  selected  for  the  comparison:  0.40  for  the  large-step  prob¬ 
lem  and  0.1 5  for  the  small-step  one.  These  are  approximately  the  average  respective  step 
sizes.  These  step  changes  are  examined  at  a  flash  rate  of  1  fps  and  with  the  random  problem 
type. 

Three  forms  of  each  normative  model  are  used.  Two  of  these  correspond  to  the  optimum 
models  selected  for  separate  consideration  of  the  large-  and  small-step  problems.  They  are 


43 


The  University  of  Michigan 


Institute  of  Science  and  Technology 


called  "discriminating"  models.  The  third  model  Is  called  "nondiscriminating," since  It  Is 
required  to  be  optimum  over  both  the  small-  and  the  large-step  problems  simultaneously. 

The  parameters  (or  these  models  were  calculated  In  Section  4.  The  nondiscriminating  models 
represent  the  only  case  where  the  models  are  not  provided  the  complete  statistical  informa¬ 
tion. 

Measurements  of  the  performance  of  models  and  subjects  are  shown  In  Table  II.  Also 
Included  are  values  of  detection  and  convergence  for  the  descriptive  models  shown  In  Figures 
21  and  22.  These  measures  are  averages  over  a  set  of  subproblems  with  average  step  sizes 
of  approximately  0.40  and  0.1S.  RMSE  and  RMSE^  were  not  available  from  the  descriptive 
model's  data. 

The  response  speed  of  the  nondiscriminating  model  lies  between  the  rapid  response  of 
the  large-step  discriminating  models  and  the  smoothing  responses  of  the  small-step  dis¬ 
criminating  models.  The  discriminating  models  have  lower  values  of  RMSE,  of  course,  since 
this  was  the  optimization  criterion.  The  nondiscriminating  models  have  an  RMSE^  between 
those  of  the  two  discriminating  models;  It  is  lower  for  the  large-Btep  problem  and  higher  for 
the  small -step  problem. 

The  subject-model  comparison  shows  a  striking  difference  in  the  detection  values  for  the 
large-step  problem.  The  normative  models  have  detection  values  considerably  smaller  than 
the  subjects'.  This  results,  to  a  large  extent,  from  the  difference  between  the  models'  smooth 
response  and  the  criterion -testing  nature  of  the  subjects'  response  hypothesized  in  Section  4. 
The  normative  models  begin  to  respond  to  the  step  change  with  the  first  flash  of  the  new  proba¬ 
bility.  The  subjects  require  a  number  of  flashes  to  perceive  a  significant  probability  change 
and  the  necessity  of  a  response  change.  The  large-step  descriptive  model  has  a  detection  value 
comparable  to  the  subjects'. 

On  the  small-step  problem  the  subjects'  detection  value  Is  higher  than  any  of  the  models' 
although  it  is  comparable  to  the  discriminating  model. 

In  convergence,  however,  the  subjects  performed  comparably  to  the  models.  On  the 
large-step  problem  only  the  discriminating,  constant  model  has  a  smaller  value.  On  the 
small-step  problem  the  subjects'  convergence  value  lies  between  those  of  the  discriminating 
and  nondiscriminating  models. 

The  hypothesis  that  the  subjects  were  adapting  to  the  difference  between  the  small-  and 
large-step  problems  receives  support  from  the  convergence  comparisons.  The  nondiscrim¬ 
inating  models  show  a  considerable  decrease  In  convergence  from  the  large-  to  the  small- 
step  problems.  The  discriminating  models  show  an  Increase  in  convergence  from  the  large- 
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TABLE  n.  COMPARISON  OF  THE  PER F ORMAN CES  OF  THE  SUBJECTS 
AND  THE  MATHEMATICAL  MODELS 


Detection 

(Flashes) 

Convergence 

(Flashes) 

RMSE 

RMSEC 

(Probability) 

Large-Step 

Problem 
(step  «  0.40) 

Infinite 

Population 

Finite 

Population 

Geometric 

r  -  0.883 

1.1 

16.7 

0.143 

0.101 

0.087 

*r  «  0.915 

1.5 

23.4 

0.148 

0.086 

0.067 

Constant 

N  -  14 

1.8 

12.2 

0.155 

0.108 

0.093 

*N  *  20 

2.5 

17.5 

0.161 

0.090 

0.072 

Subjects  (1  fps) 

7.5 

15.0 

0.170 

0.091 

Descriptive  Model 

4.2 

14.2 

Small-Step 

Problem 
(step  *  0.15) 

Geometric 

r  =  0.953 

8.4 

22.8 

0.094 

0.073 

0.042 

*r  -  0.915 

4.6 

12.4 

0.103 

0.094 

0.073 

Constant 

N  -35 

11.7 

23.3 

0.100 

0.076 

0.048 

•N  -20 

6.7 

13.3 

0.110 

0.099 

0.079 

Subjects  (1  fps) 

12.5 

17.5 

0.112 

0.095 

Descriptive  Model 

7.2 

21.0 

*r  *  0.91S  and  N  «  20  are  the  nondiscriminating  models. 

to  the  small -step  problem  as  they  change  to  a  smoother  response  form.  The  subjects  showed 
a  similar  slight  increase  in  convergence  from  the  large-  to  the  small-step  problems. 

The  subjects'  delayed  detection  with  comparable  convergence  illustrates  the  discontin¬ 
uous  nature  of  their  behavior.  Although  unable,  or  unwilling,  to  indicate  the  presence  of  a 
change  in  the  probability  for  the  first  seven  to  twelve  flashes,  they  were  then  able,  however, 
to  converge  on  the  new  probability  in  five  to  seven  more  flashes. 
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The  subjects  were  slightly  higher  than  the  models  In  RMSE,  mainly  because  they  made 
many  errors  during  the  predetection  period. 

The  subjects  compare  favorably  in  RMS&C  with  the  infinite  population  values  but  are 
poorer  than  all  but  one  of  the  finite  population  values.  The  introduction  of  the  finite  population 
correction  caused  an  appreciable  drop  In  RMSE^,  particularly  for  the  models  with  long 
averages.  It  appears  then,  by  comparison,  that  the  subjects  were  not  fully  utilizing  the 
series  constraint.  On  the  average,  their  RMSE^  dropped  only  about  0.014  from  the  random 
problem  type,  with  an  average  population  of  close  to  60,  to  the  constrained  problem  type, 
where  the  population  was  only  17. 

In  comparison  with  these  models  the  subjects  seem  fairly  adept  at  converging  on  a  new 
probability  after  they  decided  a  change  had  occurred.  This  aspect  of  the  task  may  well  have 
received  the  most  attention.  Concentration  on  this  would  lead  to  increased  RMSE^,  because 
of  false  decisions  during  the  static  portion  of  the  subproblem.  This  represents  a  deviation 
from  the  explicit  Instructions. 


« 

CONCLUSION 

The  human  performance  on  this  task  was  considerably  better  than  expected.  Two  features 
distinguish  this  task  from  those  used  in  other  investigations  of  probability  estimation.  One 
is  the  dynamic  set  under  which  the  subjects  were  performing.  This  set  for  changing  proba¬ 
bilities  was  probably  induced  primarily  by  the  subjects'  actual  experience  in  estimating  the 
dynamic  probabilities.  The  change  in  behavior  from  the  large-  to  the  small-step  problems 
could  be  viewed  as  a  partial  loss  of  this  dynamic  set. 

The  second  distinguishing  feature  was  the  display  and  response  mechanisms.  The  par¬ 
ticular  arrangement  of  lights,  scale,  and  lever  probably  had  a  high  stimulus-response  com¬ 
patibility. 

It  seems  unlikely  that  probability  estimation  is,  or  at  least  need  be,  the  limiting  factor 
in  human  binary  decision  making.  Furthermore,  it  is  reasonable  to  inquire  into  probability 
estimation  as  a  possible  useful  function  of  man  in  future  man-machine  systems  requiring  the 
use  of  Information  from  uncertain  or  probabilistic  sources. 
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Appendix  A 

INPUT  PRORAMLITY  GENERATION 

The  Input  step  sequence  was  generated  by  exhausting  the  step  changes  systematically,  us¬ 
ing  Table  III.  The  generation  procedure  was  as  follows.  All  problems  were  started  at  P  ■  0.50, 
the  row  identified  as  "probability  from,"  0.50.  The  table  entries  are  step  sizes,  those  to  the 
right  of  the  diagonal  being  positive  and  to  the  left  negative.  One  of  the  step  changes  in  the  0.50 
row  was  selected  at  random.  This  step  selection  led  to  a  new  probability,  "probability  to." 


TABLE  m.  PROBABILITIES  USED  TO  GENERATE  SEQUENCES  OF  FLASHES 


Large-Step  Problem 


Probability  To 


0.02 

0.18 

0.34 

0.50 

0.66 

0.82 

0.98 

Probability 

From 

0.02 

- 

0.16 

0.32 

0.48 

0.64 

- 

- 

0.18 

0.16 

- 

0.16 

0.32 

0.48 

0.64 

- 

0.34 

0.32 

0.16 

- 

0.16 

0.32 

0.48 

0.64 

0.50 

0.48 

0.32 

0.16 

- 

0.16 

0.32 

0.48 

Steps  +  and  - 

0.66 

0.64 

0.48 

0.32 

0.16 

- 

0.16 

0.32 

0.16,  0.32,  0.48, 

0.82 

- 

0.64 

0.48 

0.32 

0.16 

- 

0.16 

and  0.64 

0.98 

- 

- 

0.64 

0.48 

0.32 

0.16 

- 

Small-Step  Problem 

Probability  To 

0.08 

0.14 

0.26 

0.32 

0.44 

0.50 

0.56 

0.68  0.74  0.86 

0.08 

- 

0.06 

0.18 

0.24 

- 

- 

- 

- 

- 

- 

- 

0.14 

0.06 

- 

0.12 

0.18 

- 

- 

- 

- 

- 

- 

- 

0.26 

0.18 

0.12 

- 

0.08 

0.18 

0.24 

- 

- 

- 

- 

- 

0.32 

0.24 

0.18 

0.06 

- 

0.12 

0.18 

0.24 

- 

- 

- 

- 

0.44 

- 

- 

0.18 

0.12 

- 

0.06 

0.12 

0.24 

- 

- 

- 

0.50 

- 

- 

0.24 

0.18 

0.06 

- 

0.06 

0.18 

0.24 

- 

- 

0.56 

- 

- 

- 

0.24 

0.12 

0.06 

- 

0.12 

0.18 

- 

- 

0.68 

- 

- 

- 

- 

0.24 

0.18 

0.12 

- 

0.06 

0.18 

0.24 

0.74 

- 

- 

- 

- 

- 

0.24 

0.18 

0.06 

- 

0.12 

0.18 

0.86 

- 

- 

- 

- 

- 

- 

- 

0.18 

0.12 

- 

0.06 

0.92 

- 

- 

- 

- 

- 

- 

- 

0.24 

0.18 

0.06 

- 

Step  +  and  -  0.06,  0.12,  0.18,  0.24 
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This  new  probability  was  in  turn  selected  in  the  "probability  from"  list  and  a  step  change  from 
it  selected  randomly.  The  step  selections  were  made  without  replacement.  This  procedure 
was  continued  until  the  entire  table  was  exhausted.  It  was  necessary  to  constrain  the  random 
selection  at  times  in  order  to  exhaust  the  table  without  repeating  steps.  This  selection  method 
gave  a  "problem"  with  exactly  one  step  of  each  size  and  direction  to  each  probability.  The 
large-step  problems  and  the  small-step  problems  were  produced  by  separate  tables. 

The  number  of  flashes  at  each  probability  was  selected  randomly  from  the  set:  42,  54,  66, 
and  78  for  the  small-step  problems,  and  35,  51,  74,  and  89  for  the  large-step  problems.  For 
the  constrained  problems,  both  large-  and  small-step,  the  values  were  multiples  of  17:  34,  51, 
68,  and  85. 

Five  problems  were  generated  from  each  table,  one  for  each  rate.  The  same  series  of 
steps  was  used  for  the  random  and  constrained  problem  types. 


Appendix  B 

VARIANCES  OF  SAMPLE  AVERAGES  FROM  FINITE  POPULATIONS 


Consider  a  population  y(  with  mean  Y  and  with  M  members.  Let  N  samples  be  drawn 
from  yt  with 


The  variance  of  x  is 


1-N 


=  L  Vi 

i=l 


where 


i=N 

E-i-i 

1=1 


/t=N  \2 

e[(x  -  Y)Z]  =  E 

IS  w‘Xl  -  7_ 

t=N 

-  Y)  Zj  w.(x 
j=l  3  3 


i=N  |=N 


Zj  L  Vi  E  [(*.  -  Y)(*i  -  Y)1 
1=1  j=l  1  1  L  1  1  J 
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This  expression  contains  terms  which  can  be  written  as 

k-M 

E(*,-Y)2^g  (yk-Y)2 


and 


|"k-M  t-M 


E(x1-Y)(xj.y)=5i?jrTT5 


k-M 


L  Z]  (yk  -  ^  -  Zj 

Lk-1  i-1  k-1 


(yk-  y)2 


k-M 

'  M(M  -  1)  ^  (yk  ‘  ^  ’  1  *  1 


The  variance  then  becomes 
i-N  k-M 

E(^  -  Y)2  wf  E 


i-1 


M(M  -  1) 


k-1 


(yk  -  Y)2  +  1 


r 

M 


M(M  -  1) 
k-M 


"i-N  i-N  i-N  - 

L  £  vj  - 11  w* 

Li-i  j-i  1  1  i-i  lJ 


H  wf  -  l  n  <yk  -  Y)2 

i-1  J  k-1  * 


k-M 

11  (y„  -  Y)2 


k-1 


For  a  subproblem  with  -  Y  and  length  Mj  we  have 

k-M, 


and  we  can  then  write 


o2(n)  -  e[(x  -  Y)]  ■ 


Y) 


M 


1-N  -l 

l  V  2  1  I  2 

-  1  4-i  Wi  *  M  -  lj  ffi 


For  the  geometric  weighted  model  with  w(  -  ar 


1  *  1-1 
i-1 


For  the  constant  weighted  model  with  Wj  -  1/N, 
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Anoendix  C 

ORDER  Of  PRESENTATION 


The  problems  were  presented  to  the  subjects  In  the  following  order, 
where  L,  S  =  large-  and  small-step  problems,  respectively 

R,  C  =  random  and  constrained  problems 


5  after  R  or  C 

=  the  particular  problem 

Part  1,  2,  3,  4 

=  divisions  of  a  particular  problem 

Session 

Problem 

Part 

Rate  (f] 

1 

LR1 

2 

LC1 

1 

1 

2 

SCI 

4 

SRI 

8 

LC2 

1 

0.5 

3 

LR2 

1 

1 

SR2 

2 

4 

LR3 

1 

0.5 

LR2 

2 

1 

LCS 

8 

5 

SR3 

4 

LC2 

2 

0.5 

LR5 

4 

6 

SC2 

1 

1 

SR4 

1 

1 

7 

SC3 

1 

0.5 

SRS 

1 

0.5 

8 

LCS 

2 

SR5 

2 

0.5 

9 

SC3 

2 

0.5 

LC4 

4 

LR4 

8 

10 

SR5 

3 

0.5 

SR4 

2 

1 

11 

LC2 

3 

0.5 

SC  4 

8 

SC  3 

3 

0.5 

12 

SR5 

4 

0.5 

SC  5 

2 

13 

LC2 

4 

0.5 

LR3 

2 

0.5 

LR3 

3 

0.5 

14 

SC3 

4 

0.5 

LC1 

2 

1 

IS 

LR3 

4 

0.5 

SC2 

2 

1 
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Appendix  D 
INSTRUCTIONS 

The  following  formal  Instructions  were  used.  The  Instruction  method  is  discussed  in 
Section  2. 

"This  experiment  is  concerned  with  your  ability  to  estimate  probabilities  and  to  follow 
changes  that  occur  in  them  as  time  passes.  You  will  see  a  display  of  two  lights,  a  left  and  a 
right  light.  At  each  flash  one  or  the  other  of  the  lights  will  light,  indicating  right  or  left.  This 
is  exactly  analogous  to  the  drawing  at  regular  intervals  of  red  and  green  balls  from  a  jar.  You 
will  be  asked  to  estimate,  by  setting  a  dial,  your  best  guess  as  to  the  percentage  of  balls  that 
are  right.  The  dial  is  calibrated  from  0  to  100  representing  no  right  to  all  right  flashes.  For 
example,  if  you  think  that  about  68%  of  the  flashes  are  right  then  set  the  dial  at  68.  The  actual 
percentages  cover  the  entire  range  from  0  to  100  and  have  all  values  in  between.  The  percent¬ 
ages  do  not  necessarily  fall  on  the  dial  markings. 

"The  important  new  work  to  come  out  of  this  experiment  is  your  ability  to  notice  changes 
in  the  percentages  and  to  folloy  the  changing  percentage  with  the  dial  setting.  The  analogy  with 
the  balls  in  the  jar  is  the  case  where  one  or  the  other  color  is  being  taken  out  of  the  jar  by  an¬ 
other  person  without  your  knowledge.  At  times  the  percentage  will  change  slowly  in  a  continu¬ 
ous  fashion.  At  other  times  the  percentage  will  change  suddenly,  as  though  a  whole  handful  of 
one  color  had  been  removed.  If  you  are  uncertain  as  to  the  percentage  set  the  dial  at  50. 

"You  will  be  paid  according  to  how  well  you  do.  At  the  end  of  each  problem,  10  to  25  min¬ 
utes,  you  will  be  able  to  read  the  amount  of  money  off  the  meter  on  the  computer.  The  com¬ 
puter  calculates  the  difference  between  your  estimate  and  the  actual  probability,  the  error,  and 
accumulates  this  error  over  the  problem.  It  also  adds  up  a  constant  amount  of  money  per  min¬ 
ute.  You  are  paid  the  difference.  The  computer  is  adjusted  so  that  if  you  left  the  lever  at  50 
you  would  get  no  money. 

"You  will  wear  a  pair  of  earphones  and  have  a  microphone.  A  low  'seashore'  type  noise 
will  be  fed  into  the  earphones  in  order  to  mask  out  noises  from  the  street  and  the  laboratory. 
When  I  talk  to  you  the  noise  will  be  removed.  You  can  be  heard  at  all  times  through  your 
microphone.  You  are  welcome  to  make  verbal  comments  during  the  experiment.  These  are  not 
being  recorded  and  any  sort  of  language  is  acceptable." 
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AppMNlix  | 

TWO  QUALITATIvT  RE SPONSE  EXCEPTIONS 

On  two  occasions  during  approximately  70  hours  of  tracking,  the  tracking  response  was 
qualitatively  variant  from  the  norm.  These  two  situations  lasted  for  a  total  of  approximately 
35  minutes. 

The  first  occurred  during  the  pilot  experiment.  During  one  particular  problem  in  the  third 
session  a  subject  was  accumulating  error  at  a  much  higher  rate  than  in  any  of  the  previous 
sessions  or  problems.  Inspection  of  her  records  showed  that  detection  was  considerably  higher 
than  it  had  been  before.  The  Instructions  concerning  the  error  formation  and  the  payoff  were 
repeated  with  special  emphasis  on  the  rapid  error  build  up  with  large  discrepancies  between 
probability  and  response.  Her  response  returned  to  normal  on  the  next  problem. 

The  hypothesis  here  is  that  she  was  computing  the  new  probability  to  a  high  degree  of 
accuracy  before  she  responded  to  the  change.  The  "normal"  response  produces  movement 
toward  the  new  probability  as  soon  as  it  is  perceived,  with  further  refinements  as  more  data, 
flashes,  are  accumulated. 

The  second  anomaly  occurred  in  the  response  of  a  subject  in  his  twelfth  session  of  the  main 
experiment.  He  was  tracking  a  large-step  problem  at  2  fps.  The  experimenter  noted  that  the 
payoff  was  going  negative;  the  error  accumulation  was  faster  than  the  pay  accumulation.  Upon 
examining  the  records  It  was  established  that  for  about  the  first  3/4  of  the  problem,  about  15 
minutes,  the  response  was  the  mirror  image  of  the  proper  or  normal  response.  The  scale  was 
reversed  in  relation  to  the  light  flashes.  A  check  on  the  equipment  failed  to  reveal  any  mal¬ 
function.  Upon  questioning  after  the  session  the  subject  stated  that  he  was  a  bit  mixed  up  at  times. 
He  evidently  had  no  idea  that  he  was  doing  a  fairly  good  job  of  mirror-image  tracking. 

He  was  given  this  particular  problem  again  in  a  special  sixteenth  session,  and  this  second 
run  was  used  In  the  analysis. 


Appendix  F 

DATA  NOT  AVERAOfO  OVER  SUBJECTS 

Figures  23  through  29  show  some  of  the  principal  variable  interactions  for  individual  sub¬ 
jects.  The  data  for  detection  and  convergence  show  appreciable  magnitude  variations  between 
subjects  but  maintain  the  same  qualitative  relationships  in  direction  of  change  and  the  distinc¬ 
tion  between  small-step  and  large -step  problems.  Subject  S-l  was  quite  consistently  slower  in 
his  response  than  the  other  three.  All  subjects  show  a  similar  Increase  in  RMSE  with  rate 
from  1  to  8  fps.  Subject  S-3  is  consistently  higher.  All  subjects  show  a  decrease  in  RMSE^, 
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FIGURE  25.  CONVERGENCE  AS  A  FUNCTION  OF  STEP  SIZE  FOR  FOUR  SUBJECTS. 
Convergence  It  measured  In  flashes. 
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FIGURE  28.  ROOT  MEAN  SQUARE  ERROR  AFTER  CONVERGENCE,  AS  A  FUNCTION 
OF  FLASH  RATE  FOR  FOUR  SUBJECTS 
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